THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume XXXII September, 1941 Number 6 


PERMANENCE OF RETENTION 
OF FIRST-YEAR ALGEBRA 


SISTER M. FLORENCE LOUISE LAHEY 
Marygrove College, Detroit 


Upon the findings of studies in the retention of high-school subjects 
depends the solution of many of the problems of the secondary-school 
curriculum. The effectiveness of varying amounts of overlearning, 
the degree of difficulty of curriculum content, the amount of practice 
necessary for the mastery of specific units of material, the methods of 
instruction that insure maximum retention, and the most satisfactory 
placement of content in the curriculum are a few of the many edu- 
cational implications of retention investigations. 

Though there has been considerable recent research in the teaching 
of algebra, few attempts have been made to measure the retention of 
material taught in the first-year course. Table I contains a brief 
summary of the previous experimental studies concerned with the 
permanence of retention of elementary algebra. 

The college seniors used as subjects by Thorndike’* came from 
different schools and he had no record of their high-school achievement 
in algebra. The tests used to measure retention consisted of but five 
problems involving fundamentals and were not identical with those 
given to the various subjects at the termination of their courses in 
algebra. The results reported under these conditions are mere esti- 
mates. Thorndike himself says they are “hardly more than hints, 
valuable—to stimulate to adequate experiments.” 

Forms I and II of the Douglas-Survey tests, used by Worcester, 
neasure retention of first-semester algebra. Worcester attributes the 
ugh retention of this material to practice and overlearning during the 
second semester. The subjects, it is evident, studied algebra during 
part of the intervals between tests and retests. Under these conditions 
the only retention determined without intervenient practice, and 

401 


bo 

a 
k- 


402 The Journal of Educational Psychology 


perhaps instruction, was that of second-semester work. This was 
measured by Form B-I. The amount of forgetting was much more 
than on the first semester course, even though the time between the 


test and retest was less. 
TaBLeE I.—Summary oF Previous Stupies IN ALGEBRA RETENTION 


Num- 
Author Time Per cent 
ber of | Grade Test used . 
and year sebiecta elapsed retained 
Thorndike 189 | College Constructed Varied with 
(1922) seniors by author subjects 
Worcester 22 ix Douglas-Survey 
(1928) Form A-I 10 months 109.00 
Form A-II 9 months 84.68 
Form B-I 7 months 35.00 
White 326 ix Specially constructed 
(1930) for investigation 
Test I 3 months 40.97 
9 months 23.33 
16 months 23 .37 
Test II 5 months 67.2 
8 months 68.5 
16 months 69.99 
Layton 51 ix New York Regents 
(1932) 39 girls Part I 11 months 57.3 
12 boys Part II 11 months 71.4 


Test I of White’s investigation” was, on the whole, a test in funda- 
mental operations. Test II was diagnostic and measured both skill in 
manipulative technique and problem solving ability. The major part 
of Test II, however, involved the solving of problems. Retention of 
verbal problems appears to be much superior to that of algebra 
fundamentals. The computational errors in this study, however, are 
so many and so gross that but little confidence can be placed in the 
accuracy of the figures given. The study is reported in this investi- 
gation for the sake of completeness in summarizing previous research 
in algebra retention. 

Part I of the New York Regents examination for 1928, used by 
Layton,® was “‘a comprehensive test on manipulative technique.” 


4 ~ 
‘ 
é 4g 
th 
4 3 
4 
4 
fis 
Pd 
5 
> 
J 


Permanence of Retention of First-year Algebra 403 


Part II consisted of verbal problems. Forgetting in the latter is again 
found to be much less than in the fundamentals. 


THE PROBLEM 


The present study was an outgrowth of a previous investigation by 
Kellar. During the last two weeks of May, 1938, she gave a series 
of tests to the first-year algebra pupils of eight Detroit parochial 
schools. A summary of the research entitled, ‘‘The Relative Con- 
tribution of Certain Factors to Individual Differences in Algebraic 
Problem Solving Ability” appeared in the September, 1939, issue of 
the Journal of Experimental Education. The results of the testing 
procedure employed in this study formed the basis for the present 
investigation. ‘To secure measures of retention the tests in algebra 
fundamentals and problem-solving were administered again in Septem- 
ber, 1938; toward the close of January, 1939; and during the last week 
of May, 1939. 

The purposes of the study were to determine as accurately as 
possible under the conditions of the experiment: 


(1) The amount of knowledge of elementary algebra retained over a 
period of a year when no pupil participating in the investigation studied 
algebra but during which all received instruction in geometry. 

(2) The relation of intelligence to the retention of fundamental opera- 
tions and to the ability to solve problems in algebra. 

(3) Differences in the retention of algebra computational and problem- 
solving ability. 

(4) Sex differences in the retention of algebra fundamentals and the 
ability to solve verbal problems. 


This study differs from the preceding investigations in that all 
students participating in the experiment were studying geometry 
during the retention intervals. The subjects used in Worcester’s 
research'® were receiving instruction in algebra during part of the 
interval between tests and retests. In Layton’s study‘ the participants 
received no mathematics instruction of any kind during the intervals; 
in the study of White’ some pupils were studying geometry while 
others had discontinued the study of mathematics. Except in the 
correlations found to determine the influence of effort, the scores of 
the two groups were not separated. 

The subjects used in the investigation were two hundred twenty- 
nine ninth-grade pupils of seven Detroit parochial high schools. 
Included in this number were all who had participated in the experi- 


qd 
t 
of 
a 
e 
i- 
h 


404 The Journal of Educational Psychology 


mental procedure, previously described, for whom complete records of 
former and subsequent tests were available and who were studying 
geometry during the entire year following the initial administration 
of the tests. These subjects were typical of the average elementary 
algebra class of an urban center, and, it may be assumed, formed a 
representative sampling of ninth-grade algebra pupils in age, ability, 
and socio-economic status. Several reasons may be alleged for this 
assumption. All the ninth-grade pupils studying algebra for the first 
time in the seven experimental schools were used as subjects. The 
average age of the girls, one hundred fifty-three in number, was fourteen 
years, nine months; of the seventy-six boys it was fifteen years. These 
age measures are typical of ninth-grade pupils.'* The ability 
measurement showed that the intelligence of the group varied little 
from that of the corresponding age groups used by McManana* in 
the standardization of the test used to measure cognitive ability. 
Finally, the schools selected to participate in the study were located 
in various parts of Detroit widely different in social and economic 
environment. 

The algebra tests used were those described in the study to which 
reference has been made. They were devised by the author of the 
previous study® and “‘based on a study of representative textbooks 
and such standardized tests as were applicable.”’ The test in funda- 
mental operations consisted of fifty computation problems including 
subject-matter usually taught in first-year algebra through quadratics. 
Problem-solving ability was measured by Forms A and B of the special 
tests. Each form consisted of fifteen problems, with a total of sixty 
points possible on the two tests. The type and difficulty of the prob- 
lems were determined in accordance with the findings of Varnhorn.'* 
As a measure of intelligence ‘‘ Exercises in Cognitive Ability,’ Form A, 
developed by Sister Maurice McManana,® was employed. Since it 
contains no section in mathematics this test gives a measure of cognitive 
ability without any overlapping with a mathematical factor. It con- 
sists of five sections— Discrimination, Analogy, Completion, Definition, 
and Proverbs. These tests satisfy approximately the tetrad criterion, 
show a low residual correlation when the general factor is partialed 
out and a high correlation (.989) with the underlying general factor. 


THE EXPERIMENT 


The retests in the fundamental operations test and the two forms 
of the problem-solving tests were given about two weeks after the 
opening of school in September, thus furnishing a measure of Summer 


& { 
| 


SS 


Permanence of Retention of First-year Algebra 405 


vacation retention of algebra. By retesting again at the end of the 
months of January and May, intervals between the tests of approxi- 
mately four, eight, and twelve months were secured. The time limits 
for each of the three tests for the initial and three retention tests was 
one class-period of from forty to forty-five minutes. By allowing one 
point for each space correctly filled, complete objectivity of scoring 
was secured. All tests were given to the pupils by their regular home- 
room teachers, but in compliance with detailed instructions supplied 
by the investigator. In no case were the subjects told that the tests 
would be repeated. As none of the pupils were studying algebra 
during the year in which the experiment was conducted, it is reasonable 
to suppose that no reviewing nor practice, except the incidental 
repetition required for solving problems in geometry, occurred during 
the retention intervals. 

Tables II and III give the means, the standard deviations, standard 
errors of the means and the percentages of the original scores retained 
in September, January, and May for the fundamental operations test 
and the combined score for the two solving tests. 

In fundamental operations there is a loss of ten per cent over the 
Summer vacation and of another ten per cent during the first semester. 
Retention for the eight and twelve month intervals is practically the 


TaBLeE II.—FuNDAMENTAL OPERATIONS 


Number: 229 
Possible Mean SS) oM Per cent 
score retained 
50 13.34 | 6.82 | .451 
50 12.10 6.52 | .431 90.7 
0 50 10.74 | 6.24] .412 80.5 
50 10.79 | 6.66 | .440 80.8 
TaBLe II].—PRoBLEM-SOLVING 
Number: 229 
Possible Per cent 
Mean SD oM 
60 31.02 | 10.04 | .664 
60 34.27 | 10.24 | .677 110.5 


| J 
: 


406 The Journal of Educational Psychology 


same. This shows a large initial forgetting with the curve then 
becoming “almost parallel to its axis,’ the familiar forgetting phe- 
nomenon, also noted by White” in similar algebraic operations. 

In problem-solving achievement, instead of a loss, the September 
mean is slightly above that of the initial test, and there are noticeable 
increases at both the eight- and the twelve-month intervals. Several 
explanations may be advanced for this rather unexpected result. 
Besides the normal growth and development in mental ability occurring 
during the space of a year, there must have been some transfer from 
the solving of originals in geometry to the solution of algebraic 
problems. Many geometry problems involve algebraic procedures, so 
at least some practice was secured in their use and application during 
the year of the experiment. Another explanation has to do with the 
tests themselves. As noted by Kellar,’ the author of the tests used, 
the verbal problem-solving tests, though representative of the average 
high-school algebra textbook and standardized test, were ‘“‘not of such 
a level of difficulty as to demand any considerable degree of cognitive 
power.” Analysis shows, moreover, that they did not require the 
application of the more difficult computations, but rather the use of 
fundamentals commonly understood and taught, on the whole, during 
the first semester so that there was considerable application of them 
during the second semester. Familiarity with the computational 
processes involved in solving the problems prevented any unfavorable 
mental set against attempting the solution of the problems at the 
retests. A final explanation is concerned with the method that was 
used in the teaching of algebra in the experimental schools. Problem- 
solving was stressed almost from the first day of the study of algebra. 
Habits of reflective thinking and training in the forming of generaliza- 
tions were formed and strengthened by constant application during 
the year. Geometry was taught in the same way, and growth in 
ability to do reflective thinking, it is reasonable to suppose, was 
expressed in an increased ability to solve the problems demanding 
intellectual insight. Though White” and Layton® did not find an 


- actual improvement in achievement in problem-solving, both investi- 


gators discovered less forgetting in this area than in that of computa- 
tional ability. The apparent improvement discovered in a year during 
which geometry was studied, and not found in studies where the sub- 
jects were not studying geometry, may be a further argument for 
placing second-year algebra in the junior year of high school, permitting 
the placement of geometry in the sophomore year. 


| 
i 
® 
if 
| 
} 
? 


Permanence of Retention of First-year Algebra 407 

Tables IV and V give the intercorrelations of the retention scores 
after each of the three intervals for the fundamental operations test and 
the problem-solving tests. 


TaBLEe IV.—INTERCORRELATIONS OF TEsT Scores aT INTERVALS OF Four, 
AND TWELVE MonTHS 
Fundamental Operations, Number: 229 


May, September, | January, May, 

1938 1938 1939 1939 
68.9 + .023/71.4 + .021/69.3 + .023 
September, 1938............ 81.8 + .015/83.1 + .013 
69.3 + .023/83.1 + .013/83.2 + .013 


TaBLE V.—INTERCORRELATIONS OF TEsT Scores aT INTERVALS OF Four, Ercut, 
AND TWELVE MONTHS 


Problem-solving 
May, September, | January, May, 
1938 1938 1939 1939 
66000 70.7 + .022/}66.8 + .025/74.2 + .019 
September, 1938............ 80.1 + .016/77.4 + .018 
January, 1039.............. 66.8 + .025/80.1.+ .016)........... 85.0 + .012 
74.2 + .019|77.4\+ .018/85.0 + .012 


The relationships in both fundamental operations and problem- 
solving are quite significant. In fundamental operations the correla- 
tions for September, 1938, and May, 1939, (83.1) and for January and 
- May, 1939, (83.2) are particularly significant. Those pupils who 
remembered more in September continued to remember more in 
January and in May. In problem-solving the correlations (80.1) for 
the September and January and the January and May scores (85.0) 
are significantly stable. 

Tables VI and VII show the correlations between intelligence and 
the scores on the four tests. 

The closer relationship between intelligence and retention of 
problem solving ability is evident. The same tendency was noted 
in the studies of Layton* and White.” In fundamental operations the 
coefficient decreases significantly as the interval increases. Retention 
of ability in algebra fundamentals, therefore, has a slight but positive 


408 The Journal of Educational Psychology 


relation to intelligence. The coefficients for problem-solving also show 
a tendency to decrease, particularly at the twelve-month interval. 
There appears, however, a significant relation between intelligence 
and the retention of problem-solving ability. 


TaBLeE VI.—INTELLIGENCE AND FUNDAMENTAL OPERATIONS 


Number: 229 
r PE 


Number: 229 
r PE 


To compare retention of boys and girls, seventy-six of each group 
were matched for intelligence and chronological age. The average 
age of the boys was fifteen years; of the girls fourteen years eleven and 
four-tenths months. As measured by the tests previously described, 
the average intelligence was 134.72 for the boys, and 134.71 for the 
girls. Tables VIII, IX, X, and XI show the results of this part of the 
study. 


Taste VIII.—FuNDAMENTAL OPERATIONS 
76 boys, average age, 15 years; average intelligence, 134.72 


Per cent 
Mean SD oM 
11.74 7.08 81 
September, 1938...............05.005- 11.20 6.38 73 95.4 
9.55 6.94 80 81.4 


(it 
TaBLE VII.—INTELLIGENCE AND PrRoBLEM SOLVING 
| 
| 
4 


Permanence of Retention of First-year Algebra 


TaBLE [X.—FUNDAMENTAL OPERATIONS 


409 


76 girls, average age, 14 years 11.4 months; average intelligence, 134.71 


Per cent 
Mean SD oM 
11.21 6.37 .73 82.3 


TABLE X.—PROBLEM-SOLVING 


76 boys, average age, 15 years; average intelligence, 134.72 


Per cent 
Mean SD oM 

34.37 10.83 1.24 105.2 
che 34.30 10.69 1.23 105.1 


TaBLE XI.—PROBLEM-SOLVING 
76 girls, average age, 14 years 11.4 months; average intelligence, 134.71 


Per cent 
Mean SD oM 


In every comparison for algebra fundamentals the mean score of 
the girls exceeds that of the boys, a fact also noted by Pease.’® In the 
present study, however, the differences are not statistically significant 
and support Thorndike’s conclusion": that ‘‘the sexes are of approxi- 
mately equal ability” in algebra. The percentage of retention is 
practically the same for each group. 

In problem-solving the average score of the boys surpasses that of 
the girls at the initial test and at each retest. Again, the differences 
are not three times the standard errors of the differences and, therefore, 
lack statistical significance. Greater variations were found within 
the sexes than between the sexes. Though the boys appear to have a 


410 The Journal of Educational Psychology 


slight advantage in problem-solving achievement, in retention girls 
appear to be superior to boys. The differences of five and four-tenths 
per cent and six per cent for January and May are not large, however, 
and may be due to greater effort expended by the girls on a task that 
had probably become tedious at the third and fourth repetition. 
Because of this apparently superior retention, the difference in the 
scores of the girls and boys decreases with time. In May, 1938, the 
difference in the mean scores was 2.79; in May, 1939, it was only 1.26. 
The tendency for girls to exceed boys in retention scores was also noted 
by Layton® and White.!” 

Correlations between the scores for the fundamental operations 
test and the combined scores on the two tests of verbal problems were 
found for the matched boys and girls separately and for the entire 
group of two hundred twenty-nine pupils. These are given in Table 
XII. 


TaBLeE XII.—FuUNDAMENTAL OPERATIONS AND PROBLEM SOLVING 


Total group 

76 boys 76 girls (229 ) 

77.5 + .031 | 76.5 + .032 | 65.3 + .025 
September, 1938................. 66.2 + .043 | 66.1 + .044 | 62.3 + .027 
68.5 + .041 | 50.7 + .057 | 59.6 + .029 
63.6 + .046 | 48.8 + .058 | 40.5 + .037 


For the total number, as well as for the matched groups, the correla- 
tion coefficients decline steadily during the course of the year, 
making it appear that the knowledge of fundamentals and the ability 
to solve problems become less closely related as length of time between 
test and retest increases. 

With the exception of the September, 1938, correlations, the coeff- 
cients are consistently higher for boys than for the girls. Kellar® has 
shown that “facility in algebra computation is by far the most impor- 
tant factor in ability to solve algebra verbal problems.”’ The knowl- 
edge of fundamentals apparently possessed a more functional value 
for boys than for girls, since their scores in problem-solving display a 
greater and more consistent tendency to rise as the scores in funda- 
mentals increase. 

The total group consisted of one hundred forty-six girls and 83 
boys. The larger number of girls accounts for the lower coefficients 
of the total experimental group. 


ae 
te 
fy 
r 
y 


Permanence of Retention of First-year Algebra 411 


SUMMARY 


In May, 1938, an intelligence test, an algebra test involving funda- 
mental operations, and two forms of an algebra problem-solving test 
were administered to the first-year algebra classes of eight Detroit 
parochial high schools. To measure retention, the three algebra tests 
were given again to two hundred twenty-nine subjects in seven of these 
schools in September, 1938, and in January and May, 1939. Means, 
standard deviations, standard errors of the means, and the percentage 
of the original score retained over the four-, eight-, and twelve-month 
intervals were computed separately for fundamental operations and 
for problem-solving. Intercorrelations of the test results for the 
various intervals were computed and the coefficients of correlation 
between intelligence and the scores for the tests and retests found. A 
group of seventy-six boys was matched with an equal number of girls 
for age and intelligence. Means, standard deviations, and the per- 
centage of algebra retained by the two groups in both computational 
and problem-solving ability were compared. Correlations were found 
between the scores in fundamentals and verbal problems. During the 
year over which retention was studied all subjects used in the investi- 
gation received instruction in geometry. 

From the data derived under these conditions the following con- 
clusions are derived: . - 

(1) A loss of twenty per cent in achievement in fundamental oper- 
ations occurred during the first eight-month interval. After that there 
was no further loss. 

(2) In problem-solving achievement, instead of a loss, there was 
improvement over the twelve-month period during which the pupils 
studied geometry. In the entire group of two hundred twenty-nine 
cases a gain of ten per cent was noted; in a group of seventy-six girls 
the gain was over fifteen per cent. 

(3) Correlation coefficients between intelligence scores and reten- 
tion scores in fundamental operations were positive but low. As time 
elapsed, the coefficients were lower than at the end of the learning 
period. (Range 23.5 to 11.2.) 

(4) Correlations between intelligence and retention of ability to 
solve verbal problems were higher. (Range 49.9 to 44.6.) Retention 
of ability in problem-solving, therefore, appears to be much more 
closely related to intelligence than is retention of computational power. 
The tendency for the coefficient to decrease as the length of the interval 


a 


412 The Journal of Educational Psychology 


was increased gives some evidence that retention is not so closely 
related to intelligence as is achievement. This conclusion needs 
further verification. 

(5) In every case the girls made higher scores than the boys in 
algebra fundamentals. The differences, however, were not statis- 
tically significant. Percentages of retention for the two groups were 
practically the same. 

(6) The correlation coefficients between computational and 
problem-solving ability decreased with time for both sexes but were 
consistently higher for boys than for girls. 

(7) In problem-solving the boys surpassed the mean score of the 
girls on every test. In no case, however, were the differences statisti- 
cally significant. Sex differences in algebra achievement, therefore, 
appear to be negligible. In percentage of retention there appeared 
to be a slight difference in favor of the girls. Again, the significance 
of the difference was doubtful. 

(8) The results of this study conform, on the whole, to the findings 
of previous investigations of algebra retention. Differences in degrees 
of retention may be attributed largely to the tests used in the studies 


compared. 


BIBLIOGRAPHY 


1. Bonner, H. R.: Statistics of City School Systems. Bulletin, Department of the 
Interior, Bureau of Education, 1920. 

2. Buckingham, Guy E.: A Study of the Nature, Frequency, and Persistence of 
Errors Made by Students of First-year Algebra in the Four Fundamental Proc- 
esses of Addition, Subtraction, Multiplication and Division of Monomials. 

“Ph. D. Dissertation, Northwestern University, 1930. 

3. Douglas, H. R.: ‘‘Permanency of Retention of Learning in Secondary-school 
Mathematics.”” Mathematics Teacher, Vol. xxrx, 1936, pp. 287-288. 

4. Jackson, Nelson A.: ‘“‘Learning in First-year Algebra.’’ School Science and 
Mathematics, Vol. xxx1, 1931, pp. 980-987. 

5. Kellar, W. R.: ‘‘The Relative Contribution of Certain Factors to Individual 
Differences in Algebraic Problem-solving Ability.” Journal of Experi- 
mental Education, Vol. vi11, 1939, pp. 26-35. 

6. Layton, Edna Thompson: “Persistence of Learning in Elementary Algebra.” 
J. Educ. Psychol., Vol. xx111, 1932, pp. 46-55. 

7. Mason, Nellie C.: A Study in the Retention of Junior High School Mathematics. 
Master’s Thesis, University of Minnesota, 1932. 

8. McManama, Sr. Maurice: ‘‘A Genetic Study of the Cognitive General Factor 
in Human Intelligence.” Studies in Psychology and Psychiatry, Catholic 
University of America, Vol. tv, 1936, pp. 35. 

9. Monaghan, Edward A.: ‘‘ Major Factors in Cognition.” Studies in Psychology 
and Psychiatry, Catholic University of America, ITI, No. 5, 1935. 


f 
4 
> 4¢ 
oe 
< 
> 
t 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 
18. 


19. 


Permanence of Retention of First-year Algebra 413 


Pease, Glenn R. ‘Sex Differences in Algebraic Ability.” J. Educ. Psychol., 
Vol. xx1, 1930, pp. 712-714. 

Schrepel, M., and Laslett, H. R.: ‘On the Loss of Knowledge by Junior High- 
School Pupils over the Summer Vacation.” J. Educ. Psychol., Vol. xxvu, 
1936, pp. 229-303. 

Stokes, Claude N.: Comparative Study of the Results of a Certain Individual 
Method and a Certain Group Method of Instruction in Ninth-grade Mathe- 
matics. New York, Henry Holt and Co., Inc., 1931. 

Thorndike, E. L.: ‘Permanence of School Learning.” School and Soc., Vol. 
xv, 1922, pp. 625-627. 

Thorndike, E. L.: The Psychology of Algebra. New York, Macmillan, 1923. 

Varnhorn, Mary C.: A Study of the Distribution of Verbal Problems in Some 
Modern Algebra Tests. Unpublished study, Department of Education, 
Catholic University of America, 1938. 

Ward, Roscoe H.: The Determination, by Objective Tests of the Persistency of 
Errors in the Four Fundamentals of Algebra. M. A. thesis, University of 
Pittsburgh. 

White, A.: Retention of Elementary Algebra through Quadratics. Ph. D. Disser- 
tation, Johns Hopkins University, 1930. 

Worcester, D. A.: ‘‘The Permanence of Learning in High-School Subjects— 
Algebra.” J. Educ. Psychol., Vol. xrx, 1928, pp. 343-345. 

Wulff, Margaret Ann: The Retention of Junior High School Mathematics. 
M., A. thesis, University of Minnesota, 1932. 


= 


SOME DIFFICULTIES IN THE APPLICATION OF THE 
ANALYSIS OF COVARIANCE METHOD TO 
EDUCATIONAL PROBLEMS 


ROBERT W. B. JACKSON 


Department of Educational Research 
University of Toronto 

The Analysis of Variance and Covariance method has been used for 
many years in analyzing the results of agricultural experiments. 
Within the last few years, it has also been used in analyzing the results 
of educational experiments and it is likely that more use will be made 
of it in the future in this particular field. The Analysis of Covariance 
method in particular should prove to be useful, as it enables the worker 
to measure, and correct for, the effect of factors other than the ones 
in which he is mainly interested, and which it has not been possible to 
control adequately. 

It is not always wise, however, to adopt without question a method 
which has been developed mainly for use in another field. This 
method seems to admit of general application, but only experience can 
show whether or not it will prove to be as useful as it appears to be at 
first sight. The writer has found the method to be very useful, but it 
seems that it may be necessary to modify it slightly if it is to be used in 
some of our educational problems. The purpose of the present paper, 
therefore, is to introduce for discussion some of the difficulties met with 
in using the Analysis of Covariance method, and to suggest solutions for 
these. The theory underlying the analysis discussed will not be given 
here, partly because it is felt that this might tend to confuse the non- 
mathematical reader. If others are interested in these problems, a 
more detailed and technical discussion may be given in a later paper. 


EXAMPLE 1. RELATIONSHIP BETWEEN SCORES OF PUPILS ON TWO 
INTELLIGENCE TESTS 


The results shown in Table I(a) refer to the scores of one hundred 
fifteen pupils in four classes on two Intelligence tests; they are 
presented in the form generally employed in analysis of this type. 
It will be noticed that for both tests the differences between the means 
of classes are significant; it is the practice in this school to grade the 
pupils according to ability. 

The problem in which we were interested was that of determining 


the accuracy with which we can estimate the scores on test T from 
414 


7 
a 
| 
ye 
Ve 
ik; 
he 
th 
7 
7 
é 
2 
| 
5 


Analysis of Covariance Method and Educational Problems 415 


thosein test D. This is, of course, a simple problem in regression; the 
difficulty here is that there are three regression coefficients, not just 
one, as shown in Table I(b). Which should be used? 


TABLE I(a).—ANALYSIS OF VARIANCE AND COVARIANCE OF Scores OF Pupits ON 
Two INTELLIGENCE TEsTS 


Sum of Sum of Sum of 
Variance f rng squares squares products 
test D test T DT 
Between classes........... 3 2,870.179 | 15,339.886 | 6,492.184 
Within classes............. 111 11,194.551 | 58,054.062 | 21,330.173 
114 14,064.730 | 73,393.948 | 27,822.357 
TABLE I(b).—REGRESSION COEFFICIENTS 
NUMERICAL 
CoEFFICIENT VALUE 


TaBLE I(c).—ApsusTED ANALYSIS OF VARIANCE OF Scores oF Pupiis on Test 7’: 
Usine Wirsin Ciasses REGRESSION COEFFICIENT 


. Degrees of Sum of Mean 

Vastanes freedom | squares test 7 | square 

110 17,411.41 158.29 


This is one of the difficulties to be discussed; there are actually four 
possible analyses (they do not differ very much in this case). In the 
earlier work,':? it was suggested that the “within classes” regression 
coefficient should be used in correcting all three rows. Using this 
value, we obtain the results shown in Table I(c). The one degree of 
freedom corresponding to the regression coefficient used in correcting is 
subtracted from the degrees of freedom allotted to within classes. 

1 Fisher, R. A.: Statistical Methods for Research Workers. Edinburgh: Oliver 
and Boyd, 4th edition, 1932, pp. 257-262. 

* Snedecor, George W.: Calculation and Interpretation of Analysis of Variance 


and Covariance, Ames: Iowa Collegiate Press, Monograph Number One, 1934, 
pp. 65-87. 


416 The Journal of Educational Psychology 


This method is no longer used—at least, not by most of the workers— 
probably because it is difficult to justify the use of the “ within classes”’ 
coefficient for adjusting all three rows. The method now used’? is to 
correct the “ within classes’ row by using the “ within classes”’ regres- 
sion coefficient, the total row by using the total regression coefficient 
and subtracting to obtain the “‘reduced”’ or “adjusted”’ sum of squares 
for between classes. The analysis of the results by this method is 
shown in Table I(d). 


I(d).—ApsusTEp ANALYSIS OF VARIANCE OF Scores OF Pupits ON TEst 7': 
Usine WITHIN AND ToTAL REGRESSION COEFFICIENTS 


. Degrees of Sum of Mean 

Variance freedom squares test 7’ | square 

wh 110 17,411.41 158.29 


If all three regression coefficients are used (as is often done) correcting 
each row by using the corresponding regression coefficients, we have the 
results shown in Table I(e). 


I(e).—ApsusTEpD ANALYSIS OF VARIANCE OF Scores OF Pupits on TzEst 7: 
Ustnc WITHIN AND ToTaL REGRESSION COEFFICIENTS 


3. Sum of between and within classes. . . 112 18,066.34 
5. Difference 1 290.40 290.40 


Row 3 is the sum of rows 1 and 2; row 4 is the total corrected by using 
the total regression coefficient; and row 5 is the difference between rows 


1 Fisher, R. A.: Statistical Methods for Research Workers. Edinburgh: Oliver 


and Boyd, 7th edition, 1938, pp. 279-294. 
2 Snedecor, George W.: Statistical Methods. Ames: Iowa Collegiate Press, 


1938, pp. 249-273. 


| 
ee 
| 
‘ 
Be 
’ 
a 


Analysis of Covariance Method and Educational Problems 417 


4and 3. It will be noticed that the analysis of Table I(e) differs from 
that shown in I(d) only by this one degree of freedom and its associated 
sum of squares: In I(e) these are shown separately while in I(d) they 
are included in the “‘ between classes’”’ row. 

To what does this degree of freedom, and its associated sum of 
squares, correspond? It is actually a measure of the difference 
between the ‘“‘ between classes’’ and “ within classes” regression coeffi- 
cients.':? If we use the analysis shown in Table I(d), this is combined 
with the residual ‘‘ between classes”’ variation and, for this reason, it is 
suggested that this particular analysis may be misleading. If the 
difference between the regression coefficients is significant, it will, if 
combined, tend to make the variance “‘between classes” significant. 
If, on the other hand, it is not significant, it will, if combined, tend to 
make the variance ‘‘ between classes’’ non-significant. In some cases, 
as in the above, it matters very little how we analyze the results, but 
this is not always true. 

We may, of course, test the significance of the difference between 
these regression coefficients by comparing the mean square correspond- 
ing to it (290.40) with the reduced mean square “within classes”’ 
(158.29). In this case they are not significantly different, and we may 
conclude that there is a common regression coefficient. It is not 
difficult to show that the best estimate of this common regression 
coefficient is the one shown as the total in Table I(b), (1.978). 


TaBLE I(f).—Apsustep ANALYSIS OF VARIANCE OF Scores oF Pupiis on T': 
Tora, REGRESSION COEFFICIENT FOR ALL Rows 


; Degrees of Sum of Mean 

Variance freedom | squares test 7’ | square 


Since we have shown that there is no significant difference between 
the regression coefficients, it is suggested that we should use the esti- 
mate of the common coefficient to correct all rows of the table. The 


1 Rider, Paul R.: An Introduction to Modern Statistical Methods. New York: 
John Wiley & Sons, 1939, pp. 150-157. 

* Rider, Paul R.: ‘‘The Analysis of Covariance.” Official Report on the 1940 
Meeting of the American Educational Research Association, pp. 114-117. 


418 The Journal of Educational Psychology 


results of this analysis are shown in Table I(f)—this is the fourth 
possible method of analyzing these data. 

- This last analysis is valid only when there is no significant difference 
between the regression coefficients. In cases of doubt, and as a first 
step in any analysis, it is suggested that the analysis shown in Table 
I(e) should be used. As a matter of fact, it is doubtful if any general 
method can be recommended as the conditions underlying the problem 
will determine the type of analysis to be used. This point is shown 
clearly in. the next two rather unusual examples. 


EXAMPLE 2. RELATIONSHIP BETWEEN MENTAL AND CHRONOLOGICAL 
AGES OF PAIRS OF FRATERNAL TWINS! 


The results shown in Table II(a) refer to an analysis of the mental 
and chronological ages of pairs of fraternal twins. This example is 
unusual in that for within pairs the sum of squares for chronological 
age, and the sum of products, are zero—as must be the case, of course, 
since the members of the same pair of twins are of the same chronologi- 
calage. This kind of an example does not seem to occur in agricultural 
experiments; the analysis is, therefore, particularly interesting. 


TaBLE II(a).—ANALYsIS OF VARIANCE AND COVARIANCE OF MENTAL AND 
CHRONOLOGICAL AGES OF Parrs OF FRATERNAL TWINS 


Sum of 
squares Sum of 
| chronological mantel age products 
age 
Between pairs............... 25 16, 162.231 | 15,389.308 | 9,693.385 
26 0 6 ,664.000 0 
51 16,162.231 | 22,053.308 | 9,693.385 


The problem here is to remove the effect of chronological age from 
the mental age analysis. Since only the “between pairs” and the total 
variance is affected, it is clear that in this case we must leave unchanged 
the “within pairs” sum of squares for mental age. The suggested 
analysis is shown in Table II(b): Only the “between pairs” and total 
sum of squares are adjusted. 

1 Jackson, Robert W. B.: Application of the Analysis of Variance and Covariance 
to Educational Problems. Toronto: Bulletin No. 11 of the Department of Educa- 
tional Research, University of Toronto, 1940, pp. 67-74. 


4 
i} 
» 
‘ 
4 


Analysis of Covariance Method and Educational Problems 419 


TaBLeE II(b).—ApsusTEp ANALYsIS OF VARIANCE OF MENTAL AGE OF PAIRS OF 
FRATERNAL Twins: Usinc BeTwEEN PAIRS, REGRESSION COEFFICIENT 


nn 26 6 ,664.000 256 . 308 

CARE 50 16 , 239.648 
Due to effect of chronological age... ... 1 5,813.660 | 5,813.660 


We find that, in the adjusted analysis, the variance ‘“‘ between 
pairs” is not significantly greater than the variance “within pairs.” 
The contribution attributable to the effect of chronological age is 
significant, however, so we may conclude that the significant difference 
in the unadjusted analysis of mental age between the “ between pairs” 
and ‘‘ within pairs” variance was due to the effect of chronological age. 

The analysis suggested here is different from those considered in the 
first example; in fact, none of those considered earlier seem to apply. 
In this case, at least, we could not adopt without change the methods 
used in another field. 


EXAMPLE 3. RELATIONSHIP BETWEEN HIGH SCHOOL AND COLLEGE 
GRADES! 


This is an interesting example of a case in which care must be 
exercised in applying the Analysis of Covariance method. The origi- 
nal analysis is shown in Table III(a). 


TaBLE III(a).—ANALYsIS OF VARIANCE AND COVARIANCE OF HIGH-SCHOOL AND 
CoLLEGE GRADES 


: Degrees Sum of square Sum of square! Sum of 
Variance of freedom high-school college grades | products 
grades 
Between high schools....... 14 28.81 19.51 —11.38 
Within high schools........ 795 273.60 492.79 216.74 
809 302.41 512.30 205 . 36 


1 Dressel, Paul L.: ‘‘ Effect of the High School on College Grades.”’ Journal of 
Educational Psychology, Vol. xxx, No. 8, Nov., 1939, pp. 612-617. 


? 

é 


on 


420 The Journal of Educational Psychology 


It will be noticed that the relationships between high-school and 
college grades is negative for “between high schools”’ and positive for 
“‘within high schools.”” The problem is how to adjust the college 
grades analysis to allow for the high-school differences. The author 
used only the “within high schools’ regression coefficient for this 
purpose, as was suggested in the earlier work on this method. The 
regression coefficients are very different here, however, as one is posi- 
tive and the other negative, and it is difficult to justify this procedure. 
If we use an analysis similar to that shown in Table I(d)—using the 
within and total regression coefficients—we obtain the results shown in 
Table III(b). It will be noticed immediately that, strangely enough, 
the “‘ between high schools” sum of squares is now larger than that with 
which we started. What has caused this? It may easily be shown 
that it is due to the inclusion in this row of the very significant differ- 
ence between the “between” and ‘within high schools” regression 
coefficients. If we make an analysis similar to that shown in Table 
I(e), we obtain the results shown in Table III(c). The difference 
between the regression coefficients accounts for the increase noted in 


TaBLe III(b).—Apsustep ANALYSIS OF VARIANCE OF COLLEGE GRADES: UsING 
WITHIN AND ToTAL REGRESSION COEFFICIENTS 


, Degrees of | Sum of squares} Mean 
” ca freedom | college grades | square 
Between high schools.................... 14 «61.75 3.70 


III(c).—Apsustep ANALYSIS OF VARIANCE OF COLLEGE GRADES: USING 
BETWEEN, WITHIN AND ToTAL REGRESSION COEFFICIENTS 


‘ Degrees of | Sum of squares} Mean 
v os freedom | college grades | square 
Between high schools.................... 13 15.01 1.15 
Within high schools..................... 794 321.09 0.404 
Sum of between and within high schools... 807 336.10 
Due to difference between regression coeffi- 


4 
‘ 


Analysis of Covariance Method and Educational Problems 421 


Table III(b) and is, in fact, the most significant factor. Therefore, in 
this case also, care must be taken in applying the Analysis of Covari- 
ance method. 

There is one further point: In some examples the reduced or residual 
mean square between groups is significantly greater than that within 
groups. According to Rider, this is to be interpreted as meaning that 
the regression of the group means is not linear. It is felt, however, 
that this may not always be the correct explanation. If the regression 
is non-linear, the residual mean square will, of course, be significant 
(the writer has found several cases in which the factor of non-linearity 
was important), but it does not follow that in all cases of a significant 
residual mean square the only factor involved is the non-linearity of 
the regression. In some cases the results are affected by factors other 
than the one considered, but in other cases, such as in the following 
example, there is no obvious explanaticn—one or more of the pairs of 
means seem merely to be “out of line.” 


EXAMPLE 4. RELATIONSHIP BETWEEN SCORES OF PUPILS ON TWO 
INTELLIGENCE TESTS 


The results shown in Table IV(a) refer to the scores of five hundred 
ninety-four pupils in five grades on two Intelligence tests. The prob- 
lem here, as in the first example, was to determine the accuracy with 
which we could estimate the scores on Test A from those on Test B. 


TABLE IV(a).—ANALYsIS OF VARIANCE AND COVARIANCE OF ScoRES OF PUPILS ON 
Two INTELLIGENCE TESTS 


Sum of Sum of s 
Variance f squares squares 
test A test B 
Between grades............ 4 | 105,750.94 | 110,544.78 | 107,838.60 
Within grades............. 589 | 71,088.18| 74,731.90 | 63,985.83 
593 | 176,839.12 | 185,276.68 | 171,824.43 


Forming the adjusted analysis of the scores on test A, as suggested 
in Table I(e), we obtain the results shown in Table IV(b). As the 
difference between the regression coefficients is significant, the analysis 
must be left in thisform. The peculiar feature of these results is that 
the reduced variance “‘between grades”’ is significantly greater than 
that “‘within grades.”” The mean scores on the two tests for each 
grade are given in Table IV(c); there is little or no evidence of depar- 


a iow 


422 The Journal of Educational Psychology 


IV(b).—ApsustTep ANALYSIS OF VARIANCE OF ScorEs ON TEst A: 
BETWEEN, WITHIN AND TOTAL REGRESSION COEFFICIENTS 


‘ Degrees of | Sum of squares} Mean 

Variance freedom test A square 

3 552.27 | 184.09 

Sum of between and within grades....... 591 16,855.46 

Due to differences between regression 
1 634.76 | 634.76 


TaBLeE IV(c).—MeEan Scores ON Two INTELLIGENCE Tests: By GRADES 


Mean score 
Grade 

Test A Test B 
3 21.4 26.0 
4 34.5 39.4 
5 45.1 52.5 
6 54.1 59.6 
7 59.6 64.8 


tures from linearity. The residual ‘‘ between grades” effect seems to 
be explained by the fact that, for some unknown reason, one pair of 
means (for grade 5) is out of line. 

It is suggested that in cases such as this, where the residual 
“between groups” mean square is significantly greater than the 
“‘within groups” mean square, a further analysis of the data must be 
made in order to determine the factor or factors affecting the results. 
Very often a study of the individual contributions to the residual sum of 
squares will suggest a solution of the problem. 


CONCLUSION 


Examples have been given to illustrate some of the difficulties which 
arise in applying the Analysis of Covariance method to educational 
problems. It is felt that these show the inadvisability of adopting, 
without adapting, a statistical method developed mainly for use in 
another field. It is hoped that this discussion will be of interest to 
others who wish to use this method in the field of education. 


4 


INDIVIDUAL DIFFERENCES IN JUDGING 
MULTIPLE-CHOICE QUESTIONS 


FRANCES GRITTEN AND DONALD M. JOHNSON 
Fort Hays Kansas State College 


Despite the widespread use of objective tests, the determinants of 
response in the test situation have not been adequately studied. The 
response is a judgment, and judgments are made with more or less 
confidence. This paper is chiefly concerned with the réle of individual 
differences in confidence in the test situation. 

Consider first the effect of instructions. Instructions may be given 
(a) not to guess, or (6) to answer all questions, 7.e., to guess. Various 
writers have expressed opposite opinions on the merits of the two kinds 
of instructions. Most of these opinions are collected in Ruch’s 
general treatment of objective tests.’ Lee and Symonds® have 
summarized the more recent literature. Apparently the only empirical 
investigation of the effect of the instructions has been that of Ruch 
and DeGraff.* They compared reliability and validity coefficients 
obtained by the two kinds of instructions. The results seemed slightly, 
and ambiguously, in favor of instructions against guessing. To the 
present writers’ knowledge the problem remains in this inconclusive 
state. 

From the psychological point of view instructions against guessing 
require the subject to discriminate between guessing and knowing, as 
if these were qualitatively different processes. Some writers even 
seem to imply that guessing is slightly immoral. The outcome may 
be different, that is, the response may be scored right or wrong, but 
the judging process, from the subject’s side, is the same in both cases. 
Obviously, whatever name the scorer may attach to the process, the 
student uses all the pertinent data he can recall. From a more sophis- 
ticated point of view the difference between guessing and knowing is a 
quantitative one which lies in the confidence one feels in the outcome 
of the judgment. | 

Now it is known that there are large and consistent individual 
differences in confidence in a judgment.?* Therefore, when instruc- 
tions are given not to guess, one would expect that the number of 
items a subject attempts will depend not only upon his knowledge 
but also upon his confidence. This leads to our first hypothesis: 
That, when instructions are given not to guess, the more confident people 
will attempt more items. The chances are that those who attempt 

423 


‘ 
i 
4 
a 
‘A 
4 
5 


424 The Journal of Educational Psychology 


more items will pass more. Whether the more confident people will 
thus get higher scores or not depends upon the adequacy of the correc- 
tion. The present investigation is designed to secure empirical data 
on these points. It is merely necessary to get reliable scores for con- 
fidence and to correlate them with various achievement scores, such 
as number attempted, number right, number right minus corrections, 
etc. 

The adequacy of the correction in eliminating confidence from the 
scores can be checked by comparing the correlations with confidence 
of the corrected and the uncorrected scores. A further check would be 
supplied by correlating corrected and uncorrected scores with another 
criterion of achievement in the same subject-matter. These correla- 
tions would be validity coefficients in a relative sense since they would 
show which scores have most in common with a test scored by some 
other method or, in other words, which scores are least invalidated by 
the scoring system. Our second hypothesis, therefore—which follows 
logically from the first—is that those scores which are least affected by 
individual differences in confidence will show the highest validity. 


PROCEDURE 


Two forms of the Nelson-Denny Vocabulary Test were used. 
This is part of the Nelson-Denny Reading Test for high-school and 
college students. A test should be rather difficult to bring out indi- 
vidual differences in confidence, so only the last sixty-five items were 
used. The test was given without time limit in order to eliminate the 
effect of any speed factor. It will be evident from the results to 
follow that the reliability of the test given in this manner is not less 
than .81. 

Four college classes of twenty to thirty students were used. Three 
of the classes were in elementary psychology and one was in educational 
psychology. Two classes took Form A first with instructions not to 
guess. A day or two later they took Form B with instructions to 
answer all questions. The order was reversed for the other two classes. 
The total number taking the tests was one hundred six; one hundred 
three records were complete. 

The measure of confidence was obtained when Form B was given 
with instructions to answer all questions. After the judgment of 
each item was made, the subjects indicated their confidence in that 
judgment on a five-point scale ranging from 0 per cent confidence, or 
a pure guess, to 100 per cent confidence, or complete certainty. The 


‘ 


Individual Differences in Judging Multiple-choice Questions 425 


intermediary points, 25, 50, and 75 per cent, where defined in reference 
to the end-points. Johnson‘ has found, using similar material, that 
this method of reporting confidence gives reliability coefficients 
between .85 and .96 for tests of thirty items. From the Spearman- 
Brown prophecy formula we would expect reliabilities above .92 for 
sixty-five items. 

Several achievement scores were used. (1) Number of items 
attempted on Form A. If the subjects really followed the instructions 
against guessing, this would be a useful score. No one believes that 
they do follow the instructions, so this score is rarely used. It is 
included here in order to test the hypothesis that there is a relation 
between confidence and the number of items attempted. (2) Number 
right. This score without correction is included in order to test the 
adequacy of the correction. (3) The conventional correction formula 


is R — a. In the Nelson-Denny Vocabulary Test there are five 


choices, so the formula becomes R — le (4) Attempts have been 


made recently, e.g., by Soderquist,® to increase the reliability of objec- 
tive tests by penalizing heavily for errors. We cannot duplicate 
Soderquist’s conditions, because he gave special instructions to the 
subjects about guessing and weighting. It will be informative, how- 
ever, to penalize heavily for errors and correlate the scores with the 
confidence scores. The correction formula, R — W, gives quadruple 
weight to the errors in a five-choice test. Scores calculated by this 
formula were included, then, in order to secure further evidence on the 
effectiveness of the correction. 


RESULTS 


All subjects took Form A with instructions not to guess, and the 
four achievement scores described above were computed from this 
form. From Form B two scores were computed, an achievement 
score (the number right) and a confidence score (the mean of the 
sixty-five confidence reports). Table I shows the means and disper- 
sions of these six distributions. 

The mean scores in Table I show that the tests were quite difficult. 
Likewise, the confidence scores were low. The confidence scores 
should be interpreted in the light of the instructions to the subjects. 

Table II requires some explanation. Correlations above .26 are 
significant. The number attempted on Form A shows a correlation 


‘ 

4 


426 The Journal of Educational Psychology 


/ 
of .519 with mean confidence on Form B. This correlation supports 
our first hypothesis, that the more confident will attempt more items 
and, incidentally, validates the test of confidence in a judgment. 
This score shows a very low correlation with the other achievement 
score, and, therefore, could not be considered a valid indication of 
achievement. 


TaBLE I.—MEANS AND DisPERSIONS OF VARIOUS ScoRES ON Two FoRMS OF THE 
NELSON-DENNY VOCABULARY TEST 


Score Mean SD 
Form A (‘do not guess’’). 
Form B (‘‘answer all questions’’). 
Moan .. 46.2 15.9 


TaBLeE II.—CorRELATIONS OF VARIOUS ACHIEVEMENT SCORES WITH CONFIDENCE 
ON Form B, AND WITH ACHIEVEMENT ON Form B 


F A 
Form B, 
number 
Number | Number Ww : 
attempted | right 
Confidence on Form B........ .519 .537 .446 .326 .338 
Achievement on Form B...... .379 .728 .807 .765 


The number right on Form A likewise shows a moderately high 
correlation with confidence. It correlates fairly closely with achieve- 
ment on Form B, and, therefore, has some validity as a measure of 
achievement. 


The score given by the formula, R — ¥ shows a correlation with 


confidence of only .446, indicating that the correction does eliminate 
confidence to some extent, but not entirely. The correlation with 
achievement is .807, indicating rather high validity for this score. 
Apparently the involvement of confidence in this score does not 


Individual Differences in Judging Multiple-choice Questions 427 


invalidate the score greatly. Correlations between two forms of the 
same test, given and scored the same way, are often not much higher 
than this. 

The score given by the formula, R — W, shows the smallest correla- 
tion with confidence. It is smaller, even, than the correlation between 
confidence and achievement on Form B. Apparently the heavy 
penalty for errors aids in eliminating confidence from the achievement 
score. But the validity coefficient by this formula is also lower than 
the validity coefficient by the conventional formula. Nothing is 
gained, therefore, by this heavy penalty for errors. 

In general, Table II indicates that, when instructions are given 
not to guess, confidence is involved in the score, unless a correction is 
made. The table suggests—though the evidence is not clear-cut— 
that the more valid scores are those which do not involve confidence 
to a marked degree. 


DISCUSSION 


In interpreting the correlations between confidence and the various 
achievement scores obtained under instructions not to guess, one must 
consider also the correlation between confidence and achievement 
obtained under instructions to answer all questions. In the present 
investigation this correlation was .338. Johnson‘ correlated con- 
fidence and achievement on vocabulary tests in eight fields of knowl- 
edge and got correlations ranging from .07 to .67, with the median at 
.27. Apparently there is likely to be some relation between achieve- 
ment and confidence whatever instructions and scoring are used. 
However, the correlations with confidence of Number Attempted and 
Number Right in the present investigation are high enough to justify 
the conclusion that the more confident people attempt more items and 
get more right. 

Since the conventional correction seems to work well in increasing 
validity, as Ruch’s summary also shows,’ it is worth while to examine 
again the theory of this correction. It is often referred to as a correc- 
tion for guessing; sometimes as a correction for chance. It is applied 
because of the fact that some people attempt more items than others. 
The assumptions of the correction are that those who attempt more 
will get more right in accordance with the laws of probability, and 
that this excess or unfair portion of the raw score can be eliminated by 
a correction based on the laws of probability. What remains after 
the excess is removed is the number each individual would have got 


‘ 
‘ 


428 The Journal of Educational Psychology 


right if all individuals had the same minimal tendencies toward 
attempting the items. Since confidence is probably the largest factor 
(aside from knowledge) in the tendency to attempt the items, the 
correction could properly be called a correction for individual differ- 
ences in confidence. 

Dunlap and others! pointed out some time ago that, when instruc- 
tions against guessing are given, the uncorrected scores have a spuri- 
ously large dispersion and, consequently, a spuriously high reliability 
coefficient because of the inclusion of errors of measurement affecting 
both odd and even scores. These errors of measurement are individual 
differences in confidence and tendencies to attempt the items. 

Although the data of Table II indicate that the conventional 
correction is a good one, it is possible that an even better one could be 
found—perhaps by one of the statistical methods discussed by Ruch 
(7, pp. 345-353) in conjunction with the psychological considerations 
advanced above. 

It should be added that the correlation of .807 shows the validity 
of the instructions to guess. Both instructions to answer all questions 
and instructions not to guess with a correction show satisfactory 
validity. 

The results of the present investigation are helpful in understanding 
the attempt of Wiley and Trimble! to use the objective test as a 
measure of a personality trait. They had college students mark their 
answers on four psychology tests for “certainty,” “doubt” and 
‘guess.’ Someconsistency appeared from test to test in the frequency 
of use of these three categories of response. For use of “certainty” 
the average raw correlation between tests was .656, for ‘‘doubt”’ .576, 
for ‘‘guess” .566. Wiley and Trimble conclude that the ordinary 
objective test can be used, with special instructions, for the measure- 
ment of certain personality variables yet to be defined. 

The definition is easy. Confidence is the personality variable with 
which Wiley and Trimble were working. The method of reporting 
confidence used in the present experiment is more direct and makes use 
of a continuous quantitative scale, but the personality variable is the 
same. Reliability coefficients were not computed in the present 
study, but reliabilities of .82 and .88 have been obtained in attitude 
tests of twenty items,’ and reliabilities of .85 to .96 have been obtained 
in vocabulary tests of thirty items.‘ Considerable consistency of 
confidence in several fields has been demonstrated.?* The validity 
of the method has been shown in the present study. 


i 

Be! 
i 

4 

4 
4 
: 


Individual Differences in Judging Multiple-choice Questions 429 


Frances Swineford'® has also been interested in the personality 
variable which complicates test performance. She gave the subjects 
special instructions about guessing and weighting for errors. From 
the results she computed a gambling score based on the percentage of 
errors for which maximum credit was claimed. The validity of this 
gambling score has, apparently, not been established, but it had a 
reliability of .796. However, another score, frequency of claim of 
maximum credit, had a reliability of .953. This score probably is 
another way of expressing confidence. Our impression is that the 
more direct confidence score will give more information about the 
student’s personality than the gambling ratio. Ratios have always 
been difficult to interpret psychologically. 


SUMMARY AND CONCLUSIONS 


Two forms of a vocabulary test of sixty-five items were given to 
one hundred three college students. Form A was given with instruc- 
tions not to guess; Form B with instructions to answer all questions, 
i.e., to guess. An achievement score and a confidence score were 
obtained from Form B. These were correlated with several scores for 
achievement on Form A. The results permit the following conclusions. 

(1) When instructions are given not to guess, the more confident 
subjects will attempt more items, and will get more right. 

(2) Valid scores, on which the effect of individual differences in 
confidence is slight, are given by instructions to answer all questions 
and by instructions not to guess, with the conventional correction. 
The correction, from the point of view of this paper, is properly called 
a correction for individual differences in confidence. 

(3) The objective test can be used, with special instructions, to 
secure a useful measure of confidence in a judgment. 


BIBLIOGRAPHY 


1. Dunlap, J. W., DeMello, A. and Cureton, E. E.: ‘The effects of different direc- 
tions and scoring methods on the reliability of a true-false test.” School and 
Society, Vol. xxx, 1929, pp. 378-382. 

2. Johnson, Donald M.: “Confidence and speed in the two-category judgment.” 
Arch. Psychol., 1939, No. 241, pp. 52. 


3. : “Confidence and the expression of opinion.” Jour. Soc. Psychol., 
Vol. xu, 1940, pp. 213-220. 
4. : “Confidence and achievement in eight branches of knowledge.” 


Jour. Educ. Psychol., Vol. xxxu, 1941, pp. 23-36. 


430 The Journal of Educational Psychology 


5. Lee, J. M. and Symonds, P. M.: ‘‘ New-type or objective tests: a summary of 
recent investigations.”” Jour. Educ. Psychol., Vol. xxtv, 1933, pp. 21-28; 
and Vol. xxv, 1934, pp. 161-184. 

6. Nelson, M. J. and Denny, E. C.: The Nelson-Denny Test. Houghton Mifflin, 
1930. 

7. Ruch, Giles M.: The Objective or New-type Examination. Scott, Foresman, 
1929. 

8. Ruch, G. M. and DeGraff, M. H.: “‘Corrections for chance and ‘guess’ vs. 
‘do not guess’ instructions in multiple-response tests.” Jour. Educ. 
Psychol., Vol. xvi, 1926, pp. 368-375. 

9. Soderquist, Harold O.: ‘‘A new method of weighting scores in a true-false 
test.” Jour. Educ. Res., Vol. xxx, 1936, pp. 290-292. 

10. Swineford, Frances: ‘“‘The measurement of a personality trait.” Jour. Educ. 
Psychol., Vol. xxrx, 1938, pp. 295-300. 

11. Wiley, L. M. and Trimble, O. C.: “‘The ordinary objective test as a possible 
criterion of certain personality traits.”” School and Society, Vol. xii, 1936, 
pp. 446-448. 


| 
‘ 
. 


THAT VAGUE WORD, CONDITIONING 


HAROLD SAXE TUTTLE 
College of the City of New York 


So frequent are the occurrences of the word “conditioning” in 
educational literature today—and so important is the process on. 
which it throws a flood of new light—that every educator should have 
a clear picture of the kind of mental changes to which the word refers. 
Yet so vague and ill-defined, often so ambiguous, are the uses of this 
new word that readers can hardly fail to be confused—if, indeed, 
authors themselves always know precisely what they are saying! 
Conditioning is not one of those words whose derivation shouts its 
definition to all; rather it acquires its meaning from the usage which 
brought it into the vocabulary of psychology. Having taken root in 
the language of mental science it appears to be here to stay. Since 
we must deal with it, we shall be wise to know what we are dealing 
with. 

Because it came in through the psychological laboratory, it is not 
a word whose meaning can be arbitrarily defined to fit the mood of the 
writer. It refers to a function of mental life, and the only way to 
understand its meaning is to understand that function. Mental 
processes, like the musical scale, must be discovered, They cannot 
be invented. Laws of learning, like the law of gravity or Boyle’s 
Law, cannot be fixed by legislation or fiat. They remain unchanged, 
however much we may misunderstand or misstate them. 

In the first place, the word “‘ conditioning”’ refers only to a process, 
not to a process-and-its-object. Specifically, it does not necessarily 
mean conditioning reflexes; it may refer with equal accuracy to con- 
ditioning other outcomes. When -you tell me that you have been 
chopping wood, I am not obliged thereafter to assume that all chopping 
deals with wood. The butcher may haye been chopping meat; the 
cook may have been chopping cabbage. When the laundress refers 
to washing clothes, I am not obliged to assume that all washing applies 
to clothes. The maid washes dishes; we all wash our hands and faces. 
While it is true that the first conditioning which was scientifically 
discussed was limited to reflexes, that limitation no longer holds. We 
now know that conditioning may and does apply to other forms of 
behavior. Its definition must refer to the nature of the process, not to 


some selected object of the process. 
431 


‘ 

a 


432 The Journal of Educational Psychology 


In order sharply to distinguish the process of conditioning from 
its objects, as also to understand it in relation to its objects, we must 
note the different outcomes to which it may apply. For a long time 
teachers thought of learning as either memorizing or acquiring skill. 
Under the stimulating influence of Herbart, analysis and synthesis 
took their place in the list of learning types. Gradually this process 
of reflective thinking has gained prestige until, in the estimate of many, 
it is the most important type of learning. 

All three types have been recognized as closely related. To the 
behaviorist they are different in the degree of complexity of stimulus- 
response bonds set up. Skills represent fairly simple direct bonds— 
a straight line connection between stimulus and motor response, a sort 
of laying of railway rails end to end to form a clear track. Memory 
represents a direct series of connections, but with richer accompani- 
ments—the weaving of a thread out of many fibers. Reflective 
thinking involves highly intricate connections—a spider web with 
contacts at many points of an outside supporting framework. 

To the functional psychologist the three types are different in the 
degree of originality exercised. Skill exercises none; it is mere repeti- 
tion of a pattern supplied by others and repeated until the pattern is 
unconsciously followed. Memory requires enough initiative to incor- 
porate an experience into the total system of meanings already estab- 
lished, but thereafter merely revives the pattern of meaning thus 
created. Reasoning is the really creative process; meanings actually 
experienced are projected into the future as patterns not yet experi- 
enced. Reasoning thus provides for desired adjustments, it responds 
to novel situations, it solves problems. 

Whether from the behavioristic or the functional approach or from 
other possible approaches, skill, memory and reasoning are closely 
related in nature. They deal with the direction of sense impressions 
into channels of conduct. They are fashioned out of sensations or 
sense images. They are intellectual functions. 

As mental life has been made the object of scientific study, it has 
become increasingly clear that its scope is wider than that of intel- 
lectual functions. It includes affective aspects also. Conduct cannot 
be wholly accounted for by purely intellectual processes. There is a 
dynamic element in all behavior which eludes intellectual analysis. 
Desired conduct often fails to follow logical proof that such conduct is 
necessary. We do not do what we know we should. Knowledge and 


~ 
RE 
| 
‘ 

4 


That Vague Word, Conditioning 433 


judgment fail to assure good behavior. Information and reasoning are 
devoid of driving power. 

That necessary element—driving power—has long been referred 
to as motive. But motive has been a mystery. To discover what 
factors might serve as motives has not been difficult; but a method of 
producing these factors has been lacking. The creation of motives has 
had no established place in the school program. The motive-produc- 
ing process has been little understood. Some have wondered about 
it; a few have seriously wished it might be controlled; but very few have 
really come to grips with the problem of bringing the cultivation of 
motives under control as a teaching method. 

This is the process involved in conditioning. Conditioning is the 
learning activity which produces drives, urges, motives. If one wishes 
to distinguish between physiological and psychological aspects of this 
process one will prefer the more precise statement: Conditioning estab- 
lishes dynamic tendencies; these tendencies are consciously experi- 
enced as interests. When the environment opens a way for the 
expression of these interests they become conscious motives. 

What is the nature of this process by which interests are created? 
The first step was given publicity by Pavlov, although he died without 
realizing the range of possible applications of the principle he had so 
patiently developed at the reflex level. The conditioning of dogs to 
respond systematically and dependably to stimuli that were previously 
meaningless—salivation at the sound of a bell, flexion of a leg at the 
sight of a light—has been widely publicized. The evidences by no 
means support the conclusion that all conditioning deals with reflexes; 
they prove only that conditioning is a process of learning which follows 
discoverable laws, a process subject to planned control, a process 
different in nature from intellectual learning. Only this! But this is 
as revolutionary for education as*was the Copernican system for 
astronomy! 

For the educator the extensive experiments with human subjects 
conducted by Thorndike are of more practical significance than 
Pavlov’s experiments, limited as they were to reflexes and performed 
only on animals. Thorndike found that interests and attitudes could 
be created and modified directly. by associating the feelings of one 
experience with another, previously neutral, experience. He found 
that the association of feelings with any type of experience tended to 
create definite attitudes. In other words, attitudes are learned 


. 
‘ 
» 
2 
‘ 


434 The Journal of Educational Psychology 


directly, not by some roundabout reasoning process. He proved that 
conditioning is not limited to reflexes, but is applicable to all levels 
of learning. He discovered also that learning by conditioning does 
not depend on the wish of the subject to Jearn, or even on the knowl- 
edge that he is learning. 

In any typical activity, say, a child’s first attendance at school, 
intellectual learning and conditioning blend, since life is organic, 
inter-related, unified; so that the child himself is unaware of any 
distinction. But the observer can see that when he is bullied or 
ridiculed by certain children he tends to avoid them. The games 
which he can play with greatest success he chooses consistently in 
preference to those in which he is least skilful. Those children who 
have pleasant teachers return more willingly to school than those who 
have cross teachers. 

Intellectually the child may analyze the behavior of others, 
especially the more popular, to determine what brings approval. 
He may devise bribes to gain favor or reprisals for his opponents. 
But back of all intellectual processes conditioning is at work whenever 
satisfaction or annoyance are felt, directly—without reasoning, 
memory, or anticipation of consequences—weakening the tendency 
to repeat each unpleasant act and adding zest to each pleasant act. 
Under conditioning a pattern of action takes on a dynamic tendency, 
positive or negative. Its inner quality is modified by the feelings 
associated with it. Intrinsic in the thought of the act, deeper and 
swifter than logic can work, there is a voice that whispers imperiously 
to the muscles, “‘advance” or “‘retreat.” | 

While Thorndike calls the process ‘‘associative shifting,” and pre- 
fers to classify the conditioning with which Pavlov dealt as a limited 
phase of the larger aspect of affective learning, the word “‘condition- 
ing”’ has already become established as including affective learning as a 
process at all levels of application, so that Thorndike and all the rest of 
us will have to make the best of it. Ideally there should be a word 
explicitly designating this process and nothing else, a word excluding all 
other phases of learning, a word sharply defined as meaning “direct 
learning of attitudes or dynamic tendencies by association of feelings.””! 
For one who has the confidence of the educational world to bring such a 
word into general usage would be a helpful service. For reflexes have 


1 The form “impone”’ (from the Latin imponere) accurately describes the direct 
process of impressing an urge into a pattern of behavior.—But “impone” is not 
established as an English word. 


‘ 

{ 

" 
: 


That Vague Word, Conditioning 435 


been so closely attached to the idea of conditioning that many will 
assume, in spite of all warnings and evidences to the contrary, that all 
conditioning means a mechanical process of conditioning reflexes, and 
that a mechanistic philosophy of education is implicit. This is 
particularly unfortunate, for the broader application of conditioning 
points in exactly the opposite direction. 

If reflexes are not the only objects of conditioning, what other 
outcomes, we have asked, may be conditioned? The materials for 
our answer have now been supplied. The kinds of outcomes that may 
be brought about in mental life are the kinds that may be made objects 
of conditioning. Many educators from many approaches have arrived 
at essentially the same classification. When we learn—when we 
change our tendencies of behavior—we may change our (1) skills 
(which are largely chains of reflexes) in speed or precision; we may 
acquire new facts—or more accurately (2) beliefs, for some sense 
impressions are misleading, as “‘a straight stick bent in a pool”; we 
may adapt past experiences to novel situations, that is we may (3) 
form judgments, solve problems, make inventions; or we may change 
our (4) tastes, interests, attitudes. 

Conditioning may apply to all of these types of learning: It is not 
limited to the first, although, fortunately, it applies there. The con- 
ditioning of reflexes is highly important in an age of scientific appli- 
ances. We drive powerful engines at high speed over narrow highways, 
meeting and passing other similar mechanisms, controlling rate and 
direction without consciousness of the muscles involved. We touch 
keys on a typewriter or piano in combinations and at rates that thought 
can not follow. Highly complex patterns of conditioned reflexes make 
possible intricate activities, and effect a saving of time and effort 
beyond our realization. But patterns of reflexes are not the only 
objects of conditioning. 

Attitudes (numbered 4 above) are also acquired by conditioning. 
Habits of courtesy, however mechanical at first, become attitudes when 
social approbation has been consistently associated with them over a 
period of time. The more homely attitude of neatness is established 
in the same way. Truthfulness is likewise learned by conditioning. 
So also are honesty and chivalry and generosity and kindness, and all 
the other traits that make human society tolerable or lift it to a level 
of rich enjoyment. Negative traits constitute no exception to this 
law of conditioning. The Oriental’s fear of “losing face” is an excel- 
lent illustration of its potency. So are shyness, secretiveness, suspi- 


ae 


‘ 
5 


436 The Journal of Educational Psychology 


cion, stealing, lying and brutality. All social and anti-social tendencies 
are products of social approvals, conscious or unconscious, accidental 
or deliberate. 

But conditioning may apply also to the outcomes that might seem 
purely intellectual—beliefs and judgments. Many of the ideas to 
which we subscribe with conviction were acquired not by observation, 
analysis and logic, but by emotional imposition, social pressure, crowd 
psychology—forms of conditioning, all. Unfortunately this law of 
learning leads to many disastrous results—traditionalism, supersti- 
tion, stereotypes, fanaticism, and general paralysis of independent 
thinking. Nevertheless the law holds. A clearer understanding of the 
law may enable the educator to prevent much of the distress that has 
followed blind adoption of emotionalized beliefs. When an assertion 
—an economic doctrine, a political theory, a religious dogma-—is made 
in association with strong feeling it is impressed directly, without the 
necessity of thinking. If the statement is frequently repeated under 
consistently similar conditions of feeling it soon becomes deeply 
implanted. It comes to be a part of personality as surely as are such 
traits as courtesy, neatness or coéperativeness. Any opposing belief 
becomes an object not of inquiry, but of aversion. There is no uncer- 
tainty or curiosity; therefore, no inducement to give the matter 
logical thought. Disagreement means opposition; it is dangerous. 
One’s tendency is to defend the established belief. If it is attacked 
too Vigorously fanaticism develops. 

Conditioning, then, is a process of learning produced solely by 
associated feelings; strengthening an attitude or interest if pleasant, 
weakening it if unpleasant. It is thus a process of direct learning 
differing from intellectual learning. The feelings are the forces that 
produce the changes. Conditioning may be applied to skills and 
habits, leading to mechanized behavior, to beliefs, leading to stereo- 
types and all that this implies, and to tastes, interests and attitudes. 
The implications of each of these phases are numerous and important. 
But the present task is merely to define the term “conditioning,” not 
to interpret its wide applications. 

Vague and ambiguous uses of the term should be instantly corrected 
in the thinking of the reader. When attitudes are said to be ‘‘con- 
ditioned by new knowledge” the mind should ring up the challenge 
‘false definition.’”” When reference is made to ‘‘the points at which 
the economic order condition one’s philosophy” the stop sign of 
“ambiguous usage” should be flashed. Whenever such terms as 


4 i} 


That Vague Word, Conditioning 437 


‘‘nersuasion,” “indoctrination,” “I feel that . . . ” are used, clearcut 
definitions should be demanded. 

The careful reader has a right to insist that conditioning be treated 
as a process of direct learning produced by feelings of satisfaction or 
annoyance, and that judgment be recognized as a process of organizing 
images of previous experience into new patterns. While the two 
aspects of mental life cannot be separated in the consciousness of the 
individual who chooses and acts, they can be differentiated by the 
observer. They must be sharply distinguished and clearly understood 
by the educator before he can intelligently interpret the multiplying 
literature dealing with conditioning and, what is infinitely more 
important, before he can effectively create socially wholesome motives. 


j 


ANALYSIS OF A PERSONALITY TRAIT 


FRANCES SWINEFORD 
The University of Chicago 


A previous article introduced a formula for measuring a personality 
trait by means of any objective test.' The trait was defined as the 
tendency to gamble, and was found to be independent of the achieve- 
ment score on the same test. The data consisted of a seventy-five- 
item true-false test administered to one hundred sixty college students. 
More recently, the writer has had an opportunity to apply the formula 
to four tests covering different kinds of material given to three hundred 
forty-four high-school freshmen. 

The method of obtaining the gambling score will be reviewed 
briefly. The pupil is permitted to ask for credit of two, three, or 
four points for each question, with the understanding that twice the 
requested credit will be deducted from his score if his answer is wrong. 
It may be assumed that the pupil is gambling on his score against 
odds of two to one to the extent that he asks for extra credit for those 
items on which he is guessing. There being no way to separate the 
items guessed correctly from those representing correct knowledge, 
the gambling score must be based upon the incorrect items, all of 
which may be regarded as guesses. Occasionally, of course, an error 
may represent misinformation or erroneous judgment or reasoning 
instead of a guess. Evidence was presented in the earlier study, 
however, to show that the gambling score was not seriously affected by 
such items in the achievement test which was used. The formula 
adopted to measure gambling, or G, was based on only the items 
marked ‘‘4,”’ as follows: 

Errors marked ‘‘4” 
G= Total errors + 14 omissions x 100 


The omissions included in the denominator are the items which were 
skipped within the test—not those omitted at the end of the test for 
lack of time. Thus it is assumed that the skipped items would have 
been guessed if answered, and that half of them would have been 
guessed wrong. If the test is of multiple-choice type, then the skipped 
items may be treated as errors in the computation of G, for they are 
usually few in number and the discrepancy would be but slight. The 


1 Swineford, Frances: ‘‘The Measurement of a Personality Trait.” Journal of 


Educational Psychology, Vol. xxrx, April, 1938, pp. 295-300. 
438 


4 “4 
. 


Analysis of a Personality Trait 439 


score on the test itself was not weighted as the pupils believed it would 
be, but was merely the number of correct items. 

In the present study four of the tests included in a larger study 
carried on at the Thornton Township High School, Harvey, Illinois,’ 
were arranged to yield measures of G. One is a non-language test, 
Paper Form Board, in which each of the twenty-eight items consists of 
one of four geometrical figures cut into three or four sections. The 
subject is to determine which figure the sections would fit if they were 
reassembled. The second test, General Information, is a multiple- 
choice test of one hundred items covering factual information. The 
third is a fifty-item multiple-choice test of vocabulary. The fourth 
is a thirty-six-item true-false test of logical deduction based on series 
of inequalities written in terms of letters of the alphabet. With 
the exception of Paper Form Board and the first half of General 
Information, which were given during the same testing period, no two 
of these tests were administered during the same week. 

Of the four hundred fifty-seven pupils who were tested, seventy- 
four boys and thirty-nine girls were eliminated from the gambling 
study either because on one or more of these tests no extra credits were 
requested, or because on one or more tests no errors were made among 
the items attempted. The elimination of those who failed to ask for 
extra credit may have discriminated against some of the non-gamblers, 
but comments of the pupils at the time of the testing revealed that 
many of them simply became absorbed in the subject-matter of the 
tests and forgot the extra-credit instructions. The mean test score 
differences between the three hundred forty-four pupils retained for 
this study and the one hundred thirteen who were dropped show a 
tendency for the former to have lower scores than the latter. These 
differences divided by their respective probable errors are —.83, —.04, 
— 1.36, and —3.54, of which only that for the deduction test is statis- 
tically significant. The battery of tests was used for the purpose of 
measuring certain independent factors; namely, general, spatial, 
verbal, mental speed, and memorizing factors, which were defined by 
means of factor analysis. Again, the critical ratios for the mean 
differences, —1.29, —1.77, —.57, 1.82, and —2.50, favor the group 
that was dropped, except that for the mental speed factor. 

Mean G scores leave no doubt that the boys have higher gambling 
ratings than the girls. The differences are about twelve per cent for 
the verbal tests and twenty-one per cent for the non-verbal tests. 


‘ Manuscript in preparation. 


‘4 
| 


440 The Journal of Educational Psychology 


The type of test material, likewise, elicits significantly different reac- 
tions. There is less tendency to gamble when verbal tests of a type 
familiar to all school children are used, but when tests which were 
probably quite new to the pupils were used, the G score increased 
materially. This increase amounts to about twenty-seven per cent 
for the boys and about eighteen per cent for the girls. 


TasBLeE I.—Means, STANDARD DEVIATIONS, AND DIFFERENCES BETWEEN MEANS 
oF G Scores ror 189 Boys anp 155 GIRLs 


M Standard Boys mi- 
deviation nus girls 

Test 
Boys | Girls |Total| Boys} Girls |Total |Mean|PEy 
Paper Form Board............. 52.82/31 .68)43 . 29/35. 16/28 . 29/33 . 92/21 .14/2.31 
General Information........... 27 .69)14. 87/21 .91|23. 29/13. 46/20. 50/12. 82/1 .35 
Word meaning................ 28 .32)15.94/22. 74/28. 30/16. 66/24. 56)12.38/1.66 
nae 57 .00)35.29/47. 22 36 . 63/31 .02/35. 88/21 .71\2.46 


As the relatively large standard deviations suggest, the distribu- 
tions of G are not in any sense normal. All are positively skewed, with 
modes near zero. Those for the non-language tests have larger 
modes at one hundred in the case of the boys. These distributions 
are condensed in Table II. Smaller intervals were employed for all 
calculations. 


TaBLE II.—FREQUENCY DIsTRIBUTIONS OF G Scores COMPUTED FROM Four TEsTS 


Test 
Paper General 

G score Word meaning Deduction 

M | F |Total| M| F | Total| M| F | Total] M| F | Total 
90-100....} 45 53 13] ...| 13] 60! 16] 76 
75- 89.... 20) 29 5} 6 1 7{ 144 #7 
60- 74.. 15} 9) 24 si 12] 11] 22 
45- 59.. 21; 17} 38] 16 2} 418] 12) 6 418] 21) 7 28 
30- 44.. 21; 50/ 28 27] 141 411] 25) 31] 56 
15- 29.. 21; 33} 54] 63) 40] 103] 44 40) 84 ‘61 
14.. 381 96] 63] 92) 155 79) 90} 169] 28) 52) 80 
Total 189) 155| 344 189 155] 344 | 189) 155) 344 189| 155) 344 


a- — — 
x 
a 


Analysis of a Personality Trait 441 


or G Scores with Test Scores AND FACTOR 
EsTIMATES 
(Values in Italics Are at Least Three Times Their Probable Errors) 


Test from which Test General | Spatial | Verbal | Speed | Memory 
G score was computed factor | factor | factor | factor | factor 
Boys 
Paper Form Board... . 226 .189 .160 .010 .014 
General Information. . . .084 | —.087 | —.056 a .076 | —.166 
Word meaning........ .048 | —.081 | —.018 .089 > .080 | —.105 
— .024 .017 .099 .1388 | —.049 062 
Girls 
Paper Form Board... . .105 .079 .080 | —.047 | —.002 | —.077 
General Information...| —.170 | —.176 .008 | —.102 .001 | —.083 
Word meaning........ —.020 | —.019 | —.053 | —.018 | —.075 .001 
ES .013 .001 .050 .027 .007 | —.036 
Total 
Paper Form Board... . . 248 154 .217 .075 | —.017 | —.106 
General Information. . . .049 | —.087 .010 .051 .026 | —.208 
Word meaning........ .026 | —.039 .008 .051 .010 | —.133 
.012 .029 .117 .087 | —.046 | —.063 


In the earlier paper the G score was shown to be independent of 
the achievement score for the test from which it was calculated. 
Similar correlations have been computed for the four sets of G scores 
of the present data. These appear in the first column of Table III, 
which includes values for the one hundred eighty-nine boys, one 
hundred fifty-five girls, and the total group. Of these twelve correla- 


tions, three are statistically significant but are small enough that for . 


practical purposes they may be ignored. It is evident, therefore, 
that ability in the field covered by the test does not affect tendency to 
gamble. 

The last five columns of Table III contain the correlations between 
the G scores and estimates of five factors. The factors are by definition 
statistically independent; their estimates are very nearly so. Ten of 
these sixty correlations exceed three times their probable errors, but 
in no case is the same coefficient significant for both boys and girls. 
The only significant correlations which show any degree of consistency 


442 The Journal of Educational Psychology 


are those for the G scores computed from non-language tests and the 
spatial factor, and those for the G scores computed from the verbal 
tests and the memory factor, all in the section for the total group. 
All these values, however, can be attributed to sex differences, for the 
boys’ factor estimates were significantly greater than the girls’ in the 
spatial factor and significantly smaller than the girls’ in the memory 
factor. It is not clear why the remaining correlations with the spatial 
and memory factors for the total group are not also significant. In 
general, it may be concluded that no important relationship exists 
among the G scores and those factors which have been measured for 
this group. 

It remains to be shown that the G scores are consistent from test to 
test. The intercorrelations of the four G measures, computed for 
each group, appearin Table IV. Due to the significant sex differences 
shown in Table I, the total group yields slightly higher correlations 
than does either sex group. None of the sex differences among the 
correlations is significant, but the girls’ correlations for the G scores 
computed from the deduction test are consistently low. 


TaBLE IV.—INTERCORRELATIONS OF THE G ScoRES 


Boys Girls Total 


Test Test 


1. Paper Form Board............. 


2. General Information........... 
3. Word meaning................ . 397}. 790)... .|.428).754)... .|.447).798 
.440} . 381) .394)| . 303). 201) . 256) . 448) .388) .398 


Examination of the correlations reveals that a G factor common 
to all the tests and an overlapping factor between the verbal tests may 
be postulated. The high values between the verbal tests seem to 
support the hypothesis suggested in connection with the mean G 
scores; namely, that familiarity with the test material has some effect 
upon the tendency to gamble. Here, the G scores for the verbal tests 
are more consistent than those for any other pair of tests. The G 
scores for the non-language tests are no more closely related than those 
for either non-language test with either verbal test. The verbal tests 


H 
} 
é 
‘ 
| 


\ 


\ 
Analysis of a Personality Trait 443 


represent the more familiar material. On the other hand, the verbal 
tests are longer than the non-language tests, and for this reason provide 
more reliable G scores. It is likely that such reliability accounts for 
at least part of their high intercorrelations. 

The G-factor weights have been computed for each group, and are 
listed in Table V, together with regression coefficients in standard- 
score form for estimating the factor. This material has been pre- 
sented to show that the G factor can be estimated from four tests as 
reliably as the factors employed in Table III could be estimated from 
thirteen tests. The multiple-correlation coefficients for the G factor 
are .831 for the boys, .830 for the girls, and .847 for the total group. 
The multiple correlations for the other five factors range from .761 
to .901, with a mean value of .857. Doubtless longer and more 
difficult tests would give rise to more reliable measures of G. In 
several instances, the number of errors was so small as to render the 
corresponding G score particularly unreliable. 


TaBLeE V.—G-FacToR WEIGHTS AND REGRESSION COEFFICIENTS 


Regression 
Factor weights 
est \ 

Boys | Girls | Total | Boys | Girls | Total 
Paper Form Board............s000; .679 | .760 | .725 | .388 | .575 | .431 
General Information............... .599 | .543 | .641 | .172 | .046 .187 
si 596 | .601 | .630 | .170 | .284 .176 
Cc ccdinccoteupheatoouts 648 | .398 | .618 | .345 | .142 | .282 
Multiple-correlation coefficient....................0e000- 831 | .830 | .847 


SUMMARY 


Four tests have been used to measure a personality trait, which 
has been subjectively defined as the tendency to gamble. Ninth- 
grade pupils were given an opportunity to gamble against odds of two 
to one that their guessed responses on the tests were correct. The 
extent to which they availed themselves of this opportunity has been 
measured. The following conclusions and inferences have been 
drawn: 

(1) Boys have a significantly greater tendency to gamble on their 
test scores than do girls, particularly on an unfamiliar type of test. 


| 


444 The Journal of Educational Psychology 


(2) Both boys and girls have a significantly greater tendency to 
gamble on unfamiliar material than on familiar material. 

(3) None of the distributions of G scores approaches normality 
for the material used in this study. 

(4) Except for a few statistically significant but unimportant 
correlations, both positive and negative, the G scores are independent 
of the scores on the tests from which they were computed and also 
independent of five mental factors which have been measured by a 
larger battery of tests. 

(5) The intercorrelations among the G scores calculated from the 
four tests are sufficiently high to yield a multiple-correlation coefficient 
of .85 when all four measures are combined in a regression estimate of 
the G factor. 


Al 
i 
} 
‘ 

‘ 


RELIABILITY OF MULTIPLE-CHOICE MEASURING 
INSTRUMENTS AS A FUNCTION OF THE 
SPEARMAN-BROWN FORMULA, V 


H. H. REMMERS AND H. W. SAGESER 
Purdue University 


This study was conducted to test the application of the Spearman- 
Brown prophecy formula to a generalized or master attitude scale 
when used in multiple-choice forms. Experimental studies dealing 
with the same problem (papers II to IV) in this series*:*»* have shown 
the hypothesis under investigation to be supported by the data. 

“The Scale to Measure Attitude Toward Any Practice,”’ which was 
developed at Purdue University by H. H. Remmers and H. W. Bues,’ 
was chosen as the subject for the investigation. This scale was made 
up of two equivalent forms, each having thirty-seven statements. 


PROCEDURE 


A scale was constructed by combining the two equivalent forms, 
using the statements of Form A as the odd-numbered items (1, 3, 5, 7, 
etc.) and the statements of Form B were used as the even-numbered 
statements (2, 4, 6, 8, etc.), giving a resultant list of seventy-four 
statements. 

Four sets of these scales were prepared with only the number of key 
numbers varying between the four sets. The two-choice set had only 
two key numbers (2 and 1) for indicating agreement or disagreement. 
The three-choice set had three key numbers (3, 2, 1) for indicating 
agreement, undecided, or disagreement. The five-choice set had five 
key numbers (5, 4, 3, 2, 1) for indicating strong agreement, mild 
agreement, undecided, mild disagreement, strong disagreement. The 
seven-choice set had seven key numbers (7, 6, 5, 4, 3, 2, 1) for indi- 
cating very strong agreement, strong agreement, mild agreement, 
undecided, mild disagreement, strong disagreement, or very strong 
disagreement. | 

Attitude objects were chosen that would be applicable to the lives 
of college students, and concerning which there would be considerable 
range of attitude both for and against. One subject was ‘‘Compul- 
sory Semester (final) Examinations,” and the other was “Forcing 
Freshmen to Enter Extra-Curricular Activities.” 

The University students living in Cary Hall (University Residence 
Hall for Men) were asked to help by filling these forms. Dining-room 

445 


446 The Journal of Educational Psychology 


groups were taken as units. There were eight dining-rooms, two 
groups for each set of scales. These sets of two were assumed to be 
randomly selected and therefore equivalent for purposes of the 
experiment. 

The forms were distributed just before dinner to the rooms of the 
students in a dining-room group, an explanation was given at dinner, 
and it was requested that they fill in the forms that evening or as soon 
as convenient. The forms that had been filled in were collected by a 
door to door canvass later in the same evening. Students who filled 
in their forms after the door to door canvass, left the forms at the 
office of the building in which they lived. The forms were collected 
for several days afterwards. About forty-two per cent were returned, 
but not in equal numbers for each set although equal numbers of each 
set were distributed. Investigation revealed that the set having the 
least number of returns had been unwittingly distributed on the 
night of several social activities which caused many students to leave 
their rooms without filling out the forms. Also, due to handling by 
different persons in scoring and checking over a period of several 
months, some papers were damaged or lost. 

Table I shows the number of papers available for each obtained 


correlation. 


TaBLeE I.—NvuMBER oF Cases USED IN OBTAINING CORRELATIONS 


Forcing freshmen into 
Compulsory extra-curricular 
examinations 


Number of choices activities 


2); 3 5 71,2] 8 5 7 


108 | 106 | 112 


First scoring (unweighted)....... 87 | 108 | 106 | 112 
96 | 96 112 


Second scoring (weighted scoring).} 87 | 94] 92] 112 


SCORING THE FORMS 


A. First Scoring, Using Uniform Values for All Statements 
(Unweighted).—The statements used, as already indicated, were 
scaled attitude statements and the key numbers decreased in size as 
they varied from strong agreement to strong disagreement. There- 
fore, the key numbers given by the students for statements below the 
point of neutrality were reversed in value before adding; that is, a 
7 was replaced by a 1, a 6 by a 2, a 5 by a 3, in the seven choice forms. 


t 

if 

aa 


Multiple-choice Measuring Instruments 447 


In the five-choice forms the 5 was replaced by al,a4bya2. Inthe 
three-choice forms the 3 was replaced by al. In the two-choice forms 
the 2 was replaced by a 1. 

These changes having been made, the papers were scored by total- 
ing the key numbers given to the odd-numbered statements, and then 
totaling the key numbers given to the even-numbered statements on 
the same test paper. The totals were written on the form. 

B. Second Scoring, Using Bues’ Scale Values for the Statements 
(Weighted).—The key number given each statement was multiplied 
by the scale value of each statement. These products were totaled 
for the odd-numbered statements, and then totaled for the even- 
numbered statements. The totals were written on the form. 


TaBLeE IJ.—RELIABILITIES 
Forcing Freshmen into Extra-curricular Activities 


Predicted from 
Number of choices 
2 3 5 7 
Unweighted Scores 
ees 811 | .... | .015 | .9387 
.925 | .950] .... | .980 
.875 | .912 | .947 
Weighted Scores 
.... | .983 | .958 | .968 
pL | .... | .942 | .960 
.860 | .903 | .... | .955 
.853 | .900 | .935 


* Figures in parentheses are the obtained reliabilities. 


C. Obtaining Correlations—The generalized attitude scale having 
been made by combining the Forms A and B, reliability would be 
indicated by a correlation between the total scores on the two forms; 
that is, a correlation between the total of the odd-numbered state- 
ments and the total of the even-numbered statements for the same set. 
As each paper had been scored twice, once with equal values for all 
statements and once with weighted values, and the statements applied 
to two different practices, there resulted four correlations for each set 
of papers. 


1 
‘4 
1 
1 
’ 
l 
| | 


448 The Journal of Educational Psychology 


III.—RE.IABILITIES 
Compulsory Semester Examinations 


Predicted from 


Number of choices 
2 3 5 7 


Unweighted Scores 


..-. | .656 | .676 | .745 
.699 | .... | .854 | .892 
Weighted Scores 

.... | .850 | .904 | .929 
.709 | .786 | .860 


* Figures in parentheses are the obtained reliabilities. 


TaBLe IV.—Z-vaLvE DIFFERENCES AND CRITICAL Ratios* 
Compulsory Extra-curricular Activities 


Predicted from 
2 3 5 7 
CR CR CR CR 
Unweighted Scores 
Seven-choice............... 28} .14)2.0) 
Weighted Scores 


* These differences are the differences between the obtained and the predicted 
reliabilities (using the z-transformation). 


@ 

. 


Multiple-choice Measuring Instruments 449 


APPLICATION OF THE FORMULA 


The Spearman-Brown prophecy formula was then applied to each 
of these obtained correlations, to prophecy the correlations for the 
other sets of forms. Tables II and III show the correlations obtained 
and predicted. In Table II, unweighted scoring, a correlation of .596 
was obtained on the two-choice form. From it was predicted correla- 
tions of .688, .788, and .838 for the three-choice, five-choice, and seven- 
choice forms, respectively. 

Because these correlations were high, the correlations were cor- 
rected for skewness by being transformed to “z’’ functions (*, p. 215) 
and their differences studied in the light of the standard error of the 
differences. Tables IV and V show a tabulation of the differences, 
the standard errors, and the critical ratios of the standard errors. 


TaBLeE V.—Z-vaALUE DIFFERENCES AND CrITICAL Ratios* 
Compulsory Semester Examinations 


Predicted from 
2 3 5 7 
CR _|CR CR | CR 
Unweighted Scores 
15/1. 7]. 849]. 15/5. 7|.758].15|5.0 
...|.410}. 14/2. 9]. 290]. 14/2.1 
Seven-choice............... 630} . 15/4. 5). 280}. 14/2.0).120 14/0 9 
Weighted Scores 
>}.190}. 15]1.3}. 130 160.8 .200.15|1.3 
Three-choice............... ...|.360]. 15/2. 4|.410).14/2.9 
Seven-choice............... .14}1 .3} .400} . 142.9). 


* These differences are the differences between the obtained and the predicted 
reliabilities (using the z-transformation). 


In Table IV, unweighted scores, the two-choice form had a z-trans- 
formation difference of .516 between its predicted correlation for the 
three-choice and the obtained correlation of the three-choice form, 
The standard error of this difference was .15, so that the critical ratio 


) 

3 

| 


450 The Journal of Educational Psychology 


was 3.4. The difference between the obtained correlation for the 
five-choice form and the correlation predicted from the two-choice was 
1.01, which difference had a standard error of .15 with a resulting 
critical ratio of 6.7. 

It is clear from Tables IV and V that the obtained reliabilities with 
the “‘unweighted”’ scores are not in accordance with the hypothesis to 
be tested. Even though the samples were rather small and the errors, 
therefore, correspondingly large, a considerable proportion of the 
differences are beyond the allowable sampling error. When, however, 
the agreement-disagreement scores are weighted in terms of the 
experimentally-determined scale values of the scale items, the data 
support the hypothesis. 


SUMMARY AND CONCLUSION 


This study was made to determine whether the change in reliability 
in multiple-choice tests, as related to the number of alternative choices 
per test item, is a function of the Spearman-Brown prophecy formula 
when used in multiple-choice forms. 

When two different methods of scoring, “weighted” and “un- 
weighted”’ were applied, it was found that with weighted scoring as 
the number of possible responses increased for each item the reliability 
increased and that the increase was in accord with the Spearman- 
Brown prophecy formula. It was found that when these same forms 
were scored without weighting the test items, the reliabilities increased 
as the number of possible responses increased, but not in accord with 
the Spearman-Brown prophecy formula. 


REFERENCES 


1. Guilford, J. P.: Psychometric Methods. McGraw-Hill Book Company, New 
York, 1936, p. 445. 

2. Lindquist, E. F.: A First Course in Statistics. Houghton-Mifflin Company, 
New York, 1938, p. 223. 

3. Lindquist, E. F.: Statistical Analysis in Educational Research. Houghton- 
Mifflin Company, Boston, 1940, p. 266. 

4. Remmers, H. H., and Denney, H. R.: “ Reliability of Multiple-Choice Measuring 
Instruments as a Function of the Spearman-Brown Prophecy Formula, II.” 
Journal of Educational Psychology, Vol. xxx1, No. 9, December, 1940, pp. 
699-704. 

5. Remmers, H. H., and Ewart, E.: ‘Reliability of Multiple-Choice Tests as a 
Function of the Spearman-Brown Prophecy Formula, III.” Journal of 
Educational Psychology, Vol. xxx11, January, 1941, pp. 61-66. 


3 — 

5 

Dy 

+ 

Hee 


Multiple-choice Measuring Instruments 451 


6. Remmers, H. H., Karslake, Ruth, and Gage, N. L.: “Reliability of Multiple- 
Choice Measuring Instruments as a Function of the Spearman-Brown Pro- 
phecy Formula, I.” Journal of Educational Psychology, Vol. xxx1, November, 
1940, pp. 583-590. 

7. Remmers, H. H., and Others: “Studies in Attitudes—A Contribution to Social- 
psychological Research Methods.”’ Studies in Higher Education XXVI, 
Bulletin of Purdue University, Vol. xxxv, No. 4, December, 1934. 


| ‘ 

j 

J 

f 


VISUAL AND VISUAL-KINAESTHETIC LEARNING IN 
READING NONSENSE SYLLABLES 


MIRIAM FORSTER 
Psychology Department, University of Washington 


Does the value of the Fernald tracing method of learning to read 
depend, as Fernald believes, upon the additional kinaesthetic cues for 
associative learning or upon a fuller visual perception of printed 
symbols? Successful use of this tracing method of learning words has 
been frequently reported but nearly all of the evidence in its favor 
comes from its clinical use with cases of severe reading disability. 
Most data are in the form of case study reports. Recently, however, 
Berman has attempted to isolate the kinaesthetic factor and did not 
find it to be significant. Using seventeen subjects of from eight to 
fifteen years of age, all retarded two years or more on standard vocabu- 
lary tests, he had them learn ninety syllables chosen at random from 
Glaze’s list of nonsense syllables. There were two experimental groups 
which alternated daily the methods of visual-auditory and visual- 
auditory-motor learning. Similarly, forty-two geometric figures from 
a Gates reading test were learned. Berman found no significant dif- 
ferences between the two methods when his data were analyzed sta- 
tistically. He did not use any small-sample methods of analysis, 
however, and can only conclude that there may be important differ- 
ences in his data. In conclusion he says that greater economy of 
learning from the V-A-M method is shown by analysis of individual 
cases, thus bringing his argument back to a case study basis. Further- 
more, as he reports his study, he does not make clear that the experi- 
mental set-up was very well controlled especially as to length of 
exposure time of the symbols. 

Kirk reported a similar study with six subnormal boys, and found 
no significant advantage for tracing for trials to learn but a significant 
difference for retention. 

Fernald’s published reports are all of cases of severe reading retar- 
dation. She makes no recommendation of her method for general use. 
She writes of individuals with a certain type of make-up (perhaps even 
of brain-structure) who find it impossible to learn by utilization of 
visual and auditory cues alone, but who can learn by using a “‘kin- 
aesthetic” approach. Similarly, both Gates.and Monroe emphasize 
as the most common cause of failure to learn to read the failure to 


develop perceptual content from purely visual and auditory cues. 
452 


ae 
ty 
a 
| 


Visual and Visual-kinaesthetic Learning 453 


Many of the symptoms of poor reading, such as reversals, inversions, 
substitutions, disappear if visual and auditory cues are supplemented 
by tactual and kinaesthetic cues. While Gates and Monroe use many 
supplementary methods and devices to aid in the development of clear, 
complete perception of printed symbols, Fernald appears to believe 
that many individuals are peculiar in their mode of reaction to such 
sympbols. The learning of such individuals is blocked if they are not 
allowed adequate motor expression—what she calls “‘bodily adjust- 
ments.”’ Such an individual, she writes, actually needs to form the 
word with his hand and to vocalize it during the initial learning process. 
It is interesting to note, however, that even Fernald’s very extreme 
cases of disability soon discard the tracing technique and need only 
glance over a word as they articulate it in order to learn it. 

Fernald has had remarkable success with cases of extreme and 
apparently hopeless reading disability. Why this so-called kinaes- 
thetic approach to reading has succeeded when other methods have 
failed with such individuals is open to many interpretations. One of 
the obvious would seem to be the motivating power of the procedure. 
To the deeply discouraged child this novel technique offers new hope. 
It is an active approach. It is offered with a confident assurance that 
“it never fails.’”’ Once a heretofore hopelessly blocked learner realizes 
that he can sit down and by means of repeated tracings learn two or 
three new words, the game is won. This motivating value of the 
method, of course, could be that of any technique offered as new and 
stimulating and sure to succeed. 

The Fernald method guarantees the child a clear perception of each 
word since the tracing ensures his ‘‘attending” to it. It affords prac- 
tically no chance for incorrect responses toa word. It may also be that 
the addition of further kinaesthetic and tactual cues does provide for 
stronger and more specific conditioning to a printed symbol. The 
answer is not obvious. If the latter supposition is true, then the 
employment of tracing should make for better learning in any such 
situation. It was with the idea of exploring this hypothesis that 
the following experiments were set up. 


GENERAL PROCEDURE AND METHOD 


Two lists of nine words containing three phonic elements each were 
made up. In each group, designated as A and B vocabularies, there 
were three initial consonant sounds, three final consonant sounds and 
three vowel sounds. The words were so chosen that each phonic ele- 


| 
) 


454 The Journal of Educational Psychology 


ment occurred the same number of times and, hence, obtained the 
same amount of practice. Figure 1 presents these two vocabularies. 

The symbols representing these eighteen words were then photo- 
graphed on sixteen millimeter film for use in a small moving picture 
projector. By means of this projector and a mirror the words were 
flashed on to the groundglass surface of a small window inserted in the 


Vocabulary A Vocabulary B 
Practice Words] Test Words Practice Words] Test Words 


WA pair dine leak lit 

Ma, one weer XS bit 


Fig. 1. 


top of the table before which the subjects were to sit. Exposure time 
of the symbols was automatically controlled. 

In the first experiment forty subjects were used, all undergraduates 
in beginning classes in psychology with the exception of two graduate 
students. These forty subjects formed four groups of ten each, to 
provide for all possible variations of order of presentation of the word 
lists. The time of exposure for each word was 2.3 seconds; the interval 
between exposures was one second. Each word appeared consecu- 
tively six times. The method to be used was explained to the subject 
and practiced by him using a flashcard with an unrelated word on it. 
The subject was instructed to say, or say and trace, each word just 


| 
‘ 
| y 
i 


Visual and Visual-kinaesthetic Learning 455 


once each time it appeared but to continue to look at and study the 
word until it disappeared. | 

After all the words had been practiced six times each the subject 
was immediately tested for recognition of the words. The words were 
flashed on for three 2.3-second intervals each in an order different from 
the practice order. Then the subject was told he would see three 
entirely new words written in the same symbolism and was encouraged 
to make an attempt to figure out what these words were. Each of the 
three new words was flashed on for six 2.3-second intervals. 

While running the above experiment the author soon came to 
suspect that the experimental set-up was not giving a fair chance for 
the tracing method to reveal its merits. The interval of exposure of 
2.3 seconds was entirely too short. It seemed desirable, therefore, to 
have an exposure time long enough for the subject to be able to get a 
good look at the word as well as to trace it. A five-second exposure was 
adopted for this second experiment, which gave each subject ample 
time to see the word before he traced it, to trace it, and usually to study 
itafter he had traced it. Thus, whatever cues might be afforded by the 
tracing would be in addition to the cues present in the look-say method. 
The interval between exposures remained one second. 

A second group of only twenty subjects, five for each group, was put 
through a procedure identical to that of the first experiment but with 
this longer exposure time. Because of delay in construction of a new 
automatic timer, the timing was done manually with a stop-watch. 
Each word was exposed five times. When using the tracing method, 
the subjects were directed to trace the word at least once when trying 
to recognize it in the testing period. For testing the learning of the six 
practiced words an exposure of ten seconds was allowed. For testing 
recognition of the three unfamiliar words an exposure of fifteen seconds 
was used. 

It was obvious that those subjects who quickly realized that each 
word was made up of three phonic elements and set themselves to learn 
these elements had some advantage over the other subjects. A third 
experiment was tried to eliminate this variable. The procedure was 
identical with that in the second experiment except that each subject 
was told that each word he was to learn was made up of three phonic 
elements and was shown illustrative examples. Twenty subjects were 
used. An automatic timer again controlled the exposure of each word, 
which was five seconds. The interval, between exposures was one 


second, 


> 
i 
, 
‘ 
z 
: 
‘ 
a 
> 


456 The Journal of Educational Psychology 


TREATMENT OF RESULTS 


The criterion of learning was immediate recognition of each word 
and the method of scoring responses was as follows: A word correctly 
recognized in toto was scored as three points, one point for each phonic 
element init. Thus, the maximum score for each practice group of six 


TaBLE I.—Correct Responses TO Srx PractiseEp Worps 


’ Mean correct 
Experi- | Number of Difference t p 
ment subjects Léckeny T —T 
I 40 15.13 11.30 3.83 4.35 | less than .01 
II 20 16.75 14.09 2.66 2.42 .028 
III 20 13.27 12.08 1.19 3.87 | less than .01 


IJ.—Correct Responses TO THREE UNPRACTISED Worps 


Mean correct responses} 
Experi- | Number of Difference t p 
ment subjects Look-eay T L-s —T 
I 40 4.94 4.24 .70 1.47 . 148 
II 20 6.54 4.83 1.71 3.16 | less than .01 
III 20 6.43. 5.32 1.11 2.18 .031 


words was eighteen points. A word partially correct obtained one or 
two points of credit. It was obvious from the data that the total 
number of correct responses for the B vocabulary was considerably less 
than that for the A vocabulary and that, whichever method was used, 
the second had an advantage over the method used first. Therefore, a 
correction of the individual scores was made for relative difficulty 
of vocabulary and for order of presentation of the two learning meth- 
ods. Correct responses for the look-say and tracing methods were then 
totalled separately and the difference between their means found. 
Table I presents the results for the three experiments. 

The difference between the means of correct responses for the two 
methods was 3.83 in favor of the look-say method. That is, these 
forty subjects made, on the average, 3.83 more correct responses when 
learning by the look-say method than they did using the tracing 


Visual and Visual-kinaesthetic Learning 457 


method. Use of the small-sampling formula for 7’ showed this differ- 
ence to be highly significant with the probability of much less than one 
per cent that this difference occurred by chance. Similarly, with the 
second experimental group of twenty subjects the difference between 
the means of the number of correct responses for the two methods was 
2.66 in favor of the look-say method with the probability of only 2.8 per 
cent that this difference occurred by chance. With the third group, 
where the subjects were told that the words consisted of three phonic 
elements, the difference between the means was smaller, 1.19, but it was 
a highly significant difference with a probability of less than 1 per cent 
that it occurred by chance. 

Comparable differences in favor of the look-say method were found 
from an analysis of responses to the three unfamiliar test words pre- 
sented after the test on the six practiced words in each experiment. 
Table II presents these results. 


DISCUSSION OF RESULTS 


In these three exploratory experiments, then, the additional cues 
provided by tracing did not prove to be an aid to learning as tested by 
immediate recognition. This is, of course, no condemnation of the 
Fernald method itself. The results may provide, however, some sup- 
port for the hypothesis that the success of the Fernald method of learn- 
ing to read is not due to reinforcement by kinaesthetic cues. This 
leaves us still with the hypothesis that the Fernald technique is par- 
ticularly valuable because it gives the child a clear and more complete 
perception of words that he would not otherwise examine, and because 
all his responses to a word will be correct ones. 

We have not proved this hypothesis, of course. To do so it 
will be necessary to provide an experimental set-up with children 
in which the tracing method can be compared with another non- 
tracing method which is just as good at “getting the attention” of 
the child. 

It is recognized, of course, that this experimental procedure was not 
exactly comparable to the teaching method used by Fernald. In the 
first place, no test was made of relative learning with and without 
written reproduction of the words after the copy had been withdrawn. 
The introduction of this variable would throw light on the success of 
the Fernald method. Secondly, these subjects were all adults with 
habits of learning fairly well established. Tracing may be a distraction 
rather than a facilitation for adult learning. 


The Journal of Educational Psychology 


REFERENCES 


Berman, A.: “‘The Influence of the Kinaesthetic Factor in the Perception of 
Symbols in Partial Reading Disability.” 


Journal of Educational Psychology, 
Vol. xxx, No. 3, 1939, pp. 187-198. 


Fernald, G. M.: On Certain Language Disabilities. Mental Measurement Mono- 
graphs, No. 11, 1936, Williams & Wilkins Co., Baltimore. 
Gates, A. I.: The Improvement of Reading. New York: Macmillan Co., 1935. 


Kirk, 8. A.: ‘‘The Influence of Manual Tracing on the Learning of Simple Words in 
the Case of Subnormal Boys.” 


Journal of Educational Psychology, Vol. xxiv, ‘ 

No. 7, 1933, pp. 525-535. 

Monroe, M.: Children Who Cannot Read. Chicago: University of Chicago Press, 
1932. 


. 
458 
Be 


AN EXTENSION OF THE DOOLITTLE METHOD TO 
SIMPLE REGRESSION PROBLEMS 


ROBERT J. WHERRY 
University of North Carolina 


The writer has for several years been teaching an extension of the 
Doolittle method adapted to simple linear regression problems to 
students in elementary psychological statistics. The success of the 
method in such classes, the expressed interest of numerous statisticians, 
and the fact that there seems to be no reference to any such method in 
the literature lead to the writing of this report. 

The approach is through the regression equation and the method of 
least squares. If we want to fit the best straight line to data for two 
variables, we have an equation of the form 


Y =a+ bX, (I) 


where Y is the predicted, dependent variable and X the independent 
variable. According to the least squares criterion the problem is to 


make 
= — Y)? = + bX — Y)? (II) 


a minimum. This will be accomplished through the solution of 
normal equations obtained by differentiating equation (II) with 
respect to a and then with respect to b, and setting the resulting 
derivatives equal to zero. Doing this yields 


aN +b=X — =0 (IIT) 


and 
+ b>X2 — =0 (IV) 


To these may be added a third equation obtained by expanding equa- 
tion (II) and substituting equations (III) and (IV) in the resulting 
expansion, which yields 


— + SY? = (V) 


Many writers have developed these three equations and suggested 
their solution as a means of obtaining the regression coefficients a and 
b and the sum of the errors squared (this latter quantity being used to 
obtain the standard error of estimate and/or the correlation coeffi- 
cient). Now there are many ways of solving such simultaneous equa- 


tions and the problem here is so simple as to require no particular 
459 


460 The Journal of Educational Psychology 


method. When, however, the writer applied one such method, the 
Doolittle method, very gratifying results were obtained. 

While most people use r’s (after Garrett) or p’s (after Tolley and 
Ezekial) in the Doolittle procedure as applied to multiple correlation, 
there is no reason why gross-scores cannot be applied even then. 
Certainly no new procedures are involved in applying the method to 
this special case in so far as equations (III) and (IV) are concerned. 

With respect to equation (V), however, we do have a real modifica- 
tion or extension of the customary Doolittle procedure. The writer 
as early as 1930 (private communication to Dr. H. A. Toops) had 
discovered that in multiple correlation problems the addition of an 
analogous equation to the usual normal equations permitted obtaining 
directly the values of Ze?, S*%, or 1 — Rous... m depending upon 
whether one used gross scores, p’s, or r’s, respectively, in the equations. 
The extra equation is simply added on at the end of the usual method 
and treated as an extra regular equation would be handled. The only 
point to be remembered is that the equation is equal to Le? (when 
gross scores are used, as here) rather than to zero as in the normal 
equations. 

In order to make reference to specific sections of the Doolittle 
solution the author has adopted a system of what he calls 7-values, 
or tabular entries, each such value representing the entry in some cell 
of that solution. The columns are headed a, b, and c, where the a 
and b correspond to the same letters in the equations and the c stands 
for the entry of the constant values. The rows in the table are doubly 
labeled: (1) With a letter, a, b, or c, corresponding to the value being 
solved for in that section, and (2) with a number or capital letter, 1, 2, 
3, A, or R, where the numbers refer to the order in which the row was 
obtained, the A stands for Added Row or the row obtained by adding 
up the numbered rows, and F# stands for Reciprocal Row or the row 
obtained by multiplying the A row by the negative reciprocal of its 
own first entry. A 7-value is formed by using first a column heading 
and then a row heading as subscripts for the T. Thus 7.4.2 would 
indicate the entry in the Doolittle table in the c column and in the 
b-R row. The use made of these values will be seen in the Back 
Solution Section of Tables I and II. If the reader is interested in 
carrying out the algebraic processes in Table I, he will find a complete 
job-analysis of the procedures at the bottom of Table II. This job- 
analysis will also help the reader understand the labeling of the rows 


in case the matter is still not clear. 


J 
‘ 
- 
4 
4 
BA 
mal 
‘ 
f 


~ 


Doolittle Method and Simple Regression Problems 461 


Since there are many surprising uses to be made of these internal 
tabular values, and since these would be hidden in an ordinary numeri- 
cal solution, a complete literal solution is given in Table I. 


TaBLe LITERAL SOLUTION 


a b c 
a-A|j N 
a-R| -1 —=X/N ZY/N (R, = —1/N) 
b-1 
b-2 —(2X)?/N 
b-A — (ZX)*| — 
N N 
N=IXY — —N 
— (=X)? (% = — (=x)? 
3 _ (NEXY — 
(ZY)? (NZXY — 
ry? N NIN=X? (EX), 
Back Solution 


(1) b = T..z (tabular entry in c column, b - R row) 
(2) = Tard + T car 
(3) Hence =a + bX 


(4) rxy = — tr (to be given the same sign as 7'4.z) 


(5) Sy.x = VT cc.a(—Ra) 


(6) Mx = —T (8) ox = V Tw.a(—Ra) 
(7) My = Teak (9) oy = V (Teer + Tee-2)(—Ra) 


Equations (1) through (9) at the bottom of Table I should be 
verified by showing that the tabular entries cited are indeed equal 
to the various constants. 

The easiest in this respect are (6) and (7). In the regular gross 
score notation they are: 


—>X 
- (=), (6a) 
and 
Moa (7a) 


which are seen immediately to equal the tabular values. 


462 The Journal of Educational Psychology 
In equations (8) and (9), the customary gross-score equations are: 


and similarly for variable Y. 
In equation (4) we have customarily 


— TXTY 
ZX? — (2X)*[NZY? — (TY)? 


(2X)? 


N(N 2X? — (2X)?] 
For (5) the customary formula is 
Sy.x = xy? = om 
but from (4a) and (9a) we have by substitution 


Txy 


For the regression equation coefficients, equations (1) and (2), the 
usual gross-score expression is 


My “= rry — Mx 


ox 


Y = 4 + 
ox 
Substitution of equations 6a, 7a, 8a, 9a, and 4a yields, after reduction 
and simplification: 


| — (2X)? N | — (2X)? N 
(1a, 2a, 3a) 


While all of this algebra may discourage the reader, it must be 
remembered that an actual numerical example requires nothing but 
substitution and a little arithmetic. Such an example will perhaps 
make the whole procedure clearer. Let us assume the constants 


=X = 120, ZY = 160, ZXY = 1000 
= 800, = 1300, N = 40. 


weer 


Doolittle Method and Simple Regression Problems 463 


Substituting these values in equations III, IV, and V we have 


40a + 1206 —-160=0 (S = 40+ 120 — 160 = 0) 
120a + 800b — 1000 = 0 (S = 120 + 800 — 1000 = —80) 
(IVa) 
(S = —160 — 1000 + 1300 
= 140) (Va) 


The Doolittle solution with all calculations (including the checking 
column which was omitted in Table I) is given in Table II. 


(IIIa) 


—160a — 10006 + 1300 = Le%y.x 


II 
analysis 

(1) a-A| 40 120 —160 0 

(2) a-R} —3 +4 0+/(R. = —.025) 

(3) b-1 800 | —1000 —80 

(4) b-2 —360 480 0 

(5) b-A 440 —520 —80 +/ 

(6) b-R —1 1.1818 .1818 4/(R = —.0022727) 

(8) — 640 0 

(10) 45.46 | 45.46+/ 

Back Solution 


(1) b = Ta.z = 1.1818 
(2) a = + = —3(1.1818) + 4 = .4546 
Check: Substitute a and n in equation IVa. 


120(.4546) + 800(1.1818) — 1000 = 999.992 — 1000 = 0 (nearly) 
(3) Thus F = 4546 + 1.1818X 


(4) rxy = Ve = + .965 (sign same as 7's.z) 
(5) Sy.x = = 45.46(.025) = 1.067 
(6) Mx = = —(-—3) =3 

(7) My = T ca-R =4 


(8) ox = VTw.a(—Re) = V400(.025) = 3.317 


(9) oy 


(2) Divide —1 by Tcc.a, obtaining Ra. 
products in row a- R. 


Job Analysis 
(1) Copy coefficients of equation III in row a- A. 


(Tees + Tee.2)(—Re) = ~/(1300 — 640)(.025) = 4.062 


Multiply row a- A by Ra, recording the 


Check (a +b +c = 


464 The Journal of Educational Psychology 


(3) Copy coefficients of equation IV in row 6 - 1. 

(4) Multiply row a- A by T0.z, recording in row b - 2. 

(5) Add rows b - 1 and b- 2, recording in row b- A. Check (6 +c = 8). 

(6) Divide —1 by Tw.4 obtaining R:. Multiply row b-A by Rs, recording 

products in rowb-R. Check (6 +c = 8S). 

(7) Copy coefficients of equation V in row c- 1. 

(8) Multiply row a- A by 7 ...r, recording products in row c - 2. 

(9) Multiply row b- A by T..r, recording products in row c - 3. 
(10) Add rows c-1, c- 2, and c- 3, recording sums in rowe-A. Check (c = 8). 
(11) Proceed to substitute values in nine numbered steps of back solution. 


Anyone wishing to use the method can cut a stencil with the normal 
equations, the Wherry-Doolittle table, the substitution equations, and 
the job-analysis steps. All should fit onto a legal sized piece of paper 
easily. 

Advantages of the new method of solution for simple regression 
constants are: 

(1) The Wherry-Doolittle method is shorter because it actually 
involves fewer arithmetical operations (if checking be required for all 
methods). 

(2) The Wherry-Doolittle method is shorter because it is more 
systematic, not requiring frequent shifts of set from one type of formula 
to another. 

(3) The checks are more certain and convincing since resolving 
a formula is always apt to lead to repetition of the same errors, while 
the Doolittle checks are not subject to this type of error. 

(4) The replacing of a variety of simple and complex (to the 
beginner) formulae by a single simple self-checking method should 
simplify the training of statistical clerks. 

(5) When the beginner has mastered the technique involved he is 
immediately able to solve multiple correlation constants in precisely 
the same manner with little further training, a condition which is not 


true as a result of the other approach. 


4, 
a 
| 


THE ELEMENT OF HABIT IN PERSISTENCE 


JOHN J. B. MORGAN AND VIOLET Z. LANNERT 
Northwestern University 


Is there an element of habit in persistence? If a child succeeds in 
solving a simple maze problem, will he tend to work longer and more 
diligently on a more difficult problem than he would have done had he 
not succeeded in the simpler maze? If a child continues to succeed in 
the solution of more and more difficult mazes, will he thereby develop 
a set or attitude which will manifest itself in more assurance when 
confronted with any maze problem? Such an attitude might be called 
the habit of success. If a child develops the habit of success in solving 
mazes will this habit carry over into other activities to such a degree 
that he will build a generalized habit of success? 

Some writers have maintained that, when a child shows unusual 
energy when confronted with a difficult problem, this manifestation 
denotes an escape from an inferiority feeling. That is, fear of failure 
(for these writers) is the dominant drive which incites a child to work 
harder when difficult situations present themselves. Evidence for 
this assertion is derived mostly from clinical material. For example, 
if a child who becomes agitated when thwarted is studied, it may be 
found that he has been humiliated in some manner. This is taken as 
evidence that he has an “inferiority complex.’”’ Having found clinical 
evidence for an inferiority feeling, the investigator is likely to assume 
that this is the essential motivating force in the child. The degree 
of zeal he shows is taken to be a measure of the potency of his ‘‘inferi- 
ority complex.’”’ What appears, on the surface, to be a manifestation 
of self-confidence, is interpreted as an attempt to escape failure. It 
is likely that such an interpretation is correct in many cases. Is it not 
possible, nevertheless, for a child to develop a wholesome habit of 
success which is not a compensation for a fear of failure? 

This paper reports the result of an experiment designed to give a 
partial answer to this question. It presents but a narrow segment of 
the whole issue. It does not attempt to generalize and to say whether 
or not success in one line of activity will lead to more persistence in the 
attempt to solve a problem in quite a different realm. Before this 
larger question can be answered, it must be shown that success in 
one type of pursuit will bring about more persistence in similar 
activities. If this can be demonstrated, later investigations may be 
able to show whether or not this habit of persistence can be transferred 

465 


4 
‘ 

l 

y 

] 


466 The Journal of Educational Psychology 


to other activities and, if this effect is found, to measure the extent 
of such transfer. 

The procedure in this experiment was to select two groups of 
children, equated in sex, age, and intelligence. Each member of one 
group was paired with a member of the control group, as nearly as 
possible, in all these respects before any experimental work was done. 
Group A consisted of twelve boys and thirteen girls who were paired 
with twelve boys and thirteen girls in Group B. The experimenter 
then filed the record of these pairings until the experimental material 
was completely gathered, so that knowledge of whether a child was in 
the experimental or in the control group could not influence the judg- 
ment of the experimenter. 

The apparatus used in the experiment was the persistence maze 
described elsewhere.! This maze can be manipulated in such a manner 
that the child is confronted with maze problems of varying difficulty 
by the simple expedient of changing blocks in various pathways. 
The members of Group A (the experimental group) were given training 
in persistence by learning maze A, then by learning maze B, and 
finally maze C. Group B (the control group) was given none of this 
preliminary training. Both groups were then given the last problem 
for which there was no solution. The hypothesis behind this proce- 
dure was: If the preliminary training in success is effective, it should 
show itself in a different kind of performance when the experimental 
group is given the task of solving a very difficult problem. When 
confronted with an unsolvable problem, will individuals who have 
experienced success work longer or perform better than control indi- 
viduals who have had no such preliminary training? 

The subject, blindfolded, was seated at a table with the board 
directly before him, and the experimenter then took his hand and 
guided it through a part of the groove in order that he could learn the 
nature of the board before the test proper began. Since the subject 
was blindfolded, he could see no portion of the maze, and could tell 
that he had solved the problem only be sensing when the stylus dropped 
into a hole. The performance of each child was timed and rated 
according to the following nine-point scale: 


1 Morgan, John J. B., and Banker, Mary H.: “‘The relation of mental stamina 
to parental protection.” J. genet. Psychol., Vol. L11, 1938, pp. 347-360. Morgan, 
John J. B., and Hull, L.: “‘The measurement of persistence.” J. appl. Psychol., 
Vol. x, 1926, pp. 180-187. 


x 
1 


The Element of Habit in Persistence 467 


1. Careless—anxious to quit the task. 

2. Excuse hunter—readily gives some excuse to get out of the task— 
feels badly, eyes hurt, time is valuable, etc. 

3. Fiddling plodder—keeps working because he apparently has not enough 
initiative to try harder or to quit—follows the line of least resistance. 

4, Intermittent worker—goes by spurts, working hard and then having 
periods of fiddling. 


\ 


) i 


= 


L 
Fie. 1.—The Morgan-Hull persistence maze. 


In Problem 1, a barrier is placed at (1) and the home pocket is located at A. This 
device cuts off most of the maze and offers a very simple problem to the subject. He is 
permitted to repeat this maze until he makes two successful runs. In Problem 2, a 
barrier is set at each of the points marked (2); one of these cuts off part of the maze and 
the other one closes the home pocket at A. The home pocket for this problem is at B. 
Thus, the subject has to unlearn his first problem and to learn one somewhat more 
difficult. In Problem 3, barriers are set at the three points marked (3), thus leaving 
open the home pocket C and blocking off some of the maze. In Problem 4, the home 
pockets A, B, and C are all closed and the entire maze is opened. This last problem 
cannot be solved—there are no home pockets open. 


5. Works hard, but has little insight. Works hard but with little intelli- 
gence. Never suspects that the maze cannot be worked. Works in a blind 


fashion. 
6. Persistent worker with some method or definite attempt to reach the 


goal. 


468 The Journal of Educational Psychology 


7. Persistent worker with some insight. Probably tries two or three 
different methods of reaching goal but shows increasing discouragement. 
8. Tenacious, obstinate worker—more determined to succeed because of 


the obstacles. Failure acts as a challenge—the greater the difficulty the 


harder he ‘works. 
9. Analytical worker—intelligently persistent to the extent that he fully 


analyzes the problem. Presents data or reasons why he thinks that the maze 
cannot be solved. 


Although this scale is qualitative, different subjects can be rated 
by means of it with considerable accuracy. 

The sex, time in minutes, rating, age, and Kuhlmann-Anderson 
IQ for each subject are given in Table I. The odd-numbered cases 
are paired with the even-numbered cases and the differences in time 
and rating are given for each pair in the last column. 

An examination of this table will show that those individuals who 
had experienced success in maze running manifested improved per- 
formance when confronted with an unsolvable maze of a kind similar 
to those with which they had practiced. This improvement is very 
striking both in persistence time and in quality of performance and 
is indicated no matter how the scores of the experimental and control 
groups are compared. 

In the first place, of the twenty-five pairs of subjects, twenty-one 
who had been trained in successful performance worked longer with 
the unsolvable problem, while only four of the untrained children 
worked longer on the final test. Only three of the untrained group 
excelled the paired member of the trained group in quality of per- 
formance as rated by the experimenter on the qualitative persistence 
scale. 

Our figures show that, as a group, individuals who had training in 
success worked significantly longer at the unsolvable problem than 
did those who had no such preliminary training. The average time 
spent on the unsolvable problem by the experimental group was 103.8 
minutes compared with an average time of 63.4 minutes by the control 
group. The difference of 40.4 minutes between these two averages has 
a standard error of the difference of 11.6, giving a critical ratio of 3.5. 

A comparison of the ratings in quality of work gives even more 
striking evidence of the effectiveness of practice in success. The 
average rating of the experimental group was 6.7 compared with an 
average rating of 3.4 received by the control group. The difference 


4 
{ 
‘ 
* 
| . 


The Element of Habit in Persistence 


469 


TaBLE I.—SHOWING THE DIFFERENCES IN PERSISTENCE TIME AND IN QUALITY OF 
WorRK BETWEEN INDIVIDUALS IN EXPERIMENTAL AND CONTROL GROUPS 


Group A Group B Differences 
Trained to succeed Untrained in success between pairs 
Time Time Time 

Case! IQ | Age jinmin-| | Case | 1Q | Age linmin-| jin min-| 

utes utes utes 
Girls 
1 | 1383 | 45 | 7 2 | 125 {13-0 | 42 | 4 3 3 
3 | 119 |13-0 18 | 1 4 | 119 /13-1 70 |3 —52 |-2 
5 | 117 |13-7 | 81 | 7 6 | 118 {13-5 | 25 | 2 56 5 
7 | 116 |13-6 55 | 9 8 117 |13-0 8 1 47 8 
9 | 114 |14-6 | 106 | 8 10 | 113 |13-9 | 27 | 1 79 7 
11 | 113 |13-6 | 73 | 6 12 | 113 |13-4 | 55 | 4 18 2 
13 | 112 {13-11} 77 | 9 14 | 112 |13-11} 56 | 3 21 6 
15 | 111 [13-0 | 74 | 7 16 | 111 |13-2 | 69 | 5 5 2 
17 | 106 [13-8 | 65 | 6 18 | 107 |13-6 | 53 | 4 12 2 
19 | 105 /13-6 | 113 | 8 20 | 106 13-5 | 49 |3 64 5 
21 | 104 {13-7 | 85 | 7 22 | 104 /13-7 | 52 | 4 33 3 
23 | 94 {14-0 | 102 | 5 24 93 13-11; 54 | 2 48 3 
25 | 84 |15-0 | 121 | 8 26 89 |14-2 | 32 | 1 89 7 
Boys 

1 | 140 }12-3 | 47 | 9 2 | 131 {12-7 | 90 | 8 —43 1 
3 | 124 /13-3 | 25 | 1 4 | 125 {13-11} 43 | 4 —-18 |-3 
5 | 114 |13-8 | 150 | 8 6 | 118 |13-4 | 87 |7 63 1 
7 | 112 13-7 | 143 | 9 8 | 112 |13-5 | 125 | 3 18 6 
9 | 108 |13-2 | 158 | 6 10_| 108 |13-7 | 36 | 1 122 5 
11 | 103 {13-4 | 146 | 8 12 | 102 |13-7 | 99 | 2 47 6 
13 | 99 {13-8 | 174 | 8 14 | 100 |13-6 | 72 | 3 102 5 
15 | 96 {14-0 | 177 | 6 16 96 |13-10; 91 | 5 86 1 
17 | 92 {14-9 | 42 | 4 18 95 |14-11| 106 | 5 —64 |-1 
19 | 91 {15-0 | 161 | 7 20 91 {15-4 | 94 | 3 97 4 
21 | 88 |144 | 172 | 6 22 88 |14-4 | 43 | 2 129 4 
23 | 86 |14-6 | 167 | 8 24 86 114-7 | 94 | 5 73 3 
Average. . 63.4) 3.4 40.4) 3.3 


{ 


470 The Journal of Educational Psychology 


between these average ratings of 3.3 points had a standard error of the 
difference of .56, giving a critical ratio of 5.9. 

These critical ratios of 3.5 and 5.9 indicate practical certainty that 
the true differences in time and in quality of work between the “‘suc- 
cess’’ group and the control group is greater than zero. 


SUMMARY 


A group of twenty-five children were paired with a control group 
in age, sex, and intelligence. Each individual in the experimental 
group was given practice in the successful running of mazes of increas- 
ing difficulty. The control group received no such practice. Both 
groups were then permitted to work on a maze for which there was no 
solution. Records were kept of the length of time that each child 
persisted in his efforts to solve the unsolvable maze, and a rating as to 
the quality of his work was made. 

A comparison of each child in the experimental with a matched 
individual in the control group shows that those children who received 
successful training in maze running tended to work longer and to do 
better work than children who had received no such training, when 
confronted with a maze for which there was no solution. Statistical 
treatment of the results indicates practical certainty that there is a 
true difference between the two groups. 


: ‘ 
or 

as 
1” 


LANGUAGE DIFFICULTIES OF THE 
BERNREUTER PERSONALITY INVENTORY 


PETER HAMPTON 
University of Manitoba 


The large number of researches so far undertaken with the Bern- 
reuter Personality Inventory give little indication of the language 
difficulties encountered on the part of the subjects taking the test. 
The experience of the author over a period of three years, during 
which time he used the Bernreuter Personality Inventory quite 
extensively with retail grocers, leads him to the conclusion that there 
are such difficulties. In order, therefore, that the use of the Inventory 
may be successfully extended to people who do not have a college or 
even a high-school education, certain changes in the choice of words 
and phrases used in the Bernreuter Personality Inventory might be 
indicated. The author realizes what a tremendous task it would be to 
revise and restandardize the Bernreuter Personality Inventory. But 
for the sake of improving psychological methodology, a refinement of 
many of our tests will have to be undertaken eventually. And so a 
few suggestions based on an investigation of language difficulties in 
the Bernreuter Personality Inventory may be in order. 

The author administered the Bernreuter Personality Inventory 
individually to seventy retail grocers, chosen at random from among 
eight hundred Winnipeg grocers listed in the telephone directory. 
Eleven nationalities were represented: Hebrew thirty-two per cent, 
English twenty-seven per cent, Scotch fifteen per cent, Irish ten per 
cent, Italian three per cent, French three per cent, Greek three per cent, 
Swedish two per cent, German two per cent, Belgian one per cent, and 
Chinese one per cent. The subjects could all read, write, and speak 
English, although only in a few cases did their education extend beyond 
the high-school level. Both sexes were represented. Men, however, 
made up eighty-two per cent of the group. The ages ranged from 
eighteen to sixty-five years, with a median age of forty years. The 
subjects were encouraged to ask the meaning of any word they did not 
understand; whereupon, the investigator defined the word in accord- 
ance with the definition given in the Funk and Wagnalls College 
Standard Dictionary. 

The following is a list of words and phrases together with the per- 
centage of subjects who did not know what they meant: “‘ Unconven- 
tional” twenty per cent, “domineering” ten per cent, “radical” 

471 


! 


472 The Journal of Educational Psychology 


ten per cent, “‘day-dream”’ seven per cent, ‘‘upbraid” six per cent, 
“stimulating” nineteen per cent, “alternate”? seventeen per cent, 
“stage-fright”’ six per cent, “self-conscious” fifteen per cent, “opti- 
mistic”’ eleven per cent, “affected’”’ ten per cent, “motives’’ seven 
per cent, “solicited” two per cent, “reluctant”? seven per cent, 
“‘gratify’”’ one per cent, “‘apparent”’ one per cent, “‘feelings of inferior- 
ity”’ six per cent, “‘low spirits” eleven per cent, ‘intellectual affairs”’ 
seven per cent, and “emotional stress’’ one per cent. ‘ 
With the assistance of Roget’s Thesaurus of English Words and 
Phrases, the investigator substituted a synonym for each word not 
understood by the subjects. Wherever possible the simplest synonym 
was chosen, much care being exerted to get a synonym which was as 
close as possible in meaning to the original word. Having substituted 
the synonym words for the words not understood, the Bernreuter 
Personality Inventory was administered individually to a second 
group of retail grocers. The second group was not as large as the 
TaBLE I.—Worps AND PHRASES FROM THE BERNREUTER PERSONALITY INVENTORY 


Not UNDERSTOOD By A NuMBER oF GROCERS, THE SYNONYMS CHOSEN 
TO RepLaceE THEM, AND FREQUENCY RaTINGs 


Words and phrases Synonyms substituted 
not understood = for words and phrases = 
ratings ratings 

Unconventional............... 9 *Non-conforming............. 8 
Domineering................. 15 ee 8 
2b Fear of appearing before public} 1a 
17 Hope for the best........... la 
3b Reasons for doing things... .. la 
Feelings of inferiority.......... 14 *Feelings of deficiency........ 7 
4.9 18 *Low state of mind........... la 
Intellectual affairs............. 2b Intellectual undertakings. .... 3b 
Emotional stress.............. 9 Emotional strain............ 3a 


H 
“ 


Language Difficulties of the Bernreuter Personality Inventory 473 


first group, consisting of only forty-five subjects. Nevertheless, the 
subjects were again chosen at random from the remaining group of 
grocers, numbering seven hundred thirty. The results of this substitu- 
tion were rather remarkable. Of the original sixteen words and four 
phrases not understood, only two words and two phrases remained, 
after the synonyms had been substituted, about the meaning of which 
several grocers from the second group were still dubious. The original 
words not understood, together with the synonyms chosen to replace 
them when the test was given to the second group of grocers, are given 
in TableI. Thesynonyms not understood by subjects from the second 
group of grocers are marked with an asterisk. The percentage of 
grocers belonging to the second group who did not understand the 
synonyms were all small, no synonym being misunderstood by more 
than three per cent. 

In order to check the frequency of occurence of the words and 
phrases not understood by a number of our subjects, and the synonyms 
substituted for these words and phrases, the investigator looked them 
up in Thorndike’s Teacher’s Word Book of 20,000 Words. The 
frequency ratings are given in the table.! Except in three cases 
(“upbraid,” ‘reluctant,’ and “intellectual affairs”) Thorndike’s 
frequency ratings substantiate our findings with respect to language 
difficulties. 

It is obvious, of course, that by substituting the above synonyms 
for words not understood by certain grocers, the meaning of the state- 
ments in the Bernreuter Personality Inventory, from which these 
words and phrases were culled, was disturbed. That could not be 
avoided. Neither does the investigator assume that the synonyms 
chosen by him are the most adequate that could have been found. 
All that is intended is to suggest a way in which the Bernreuter 
Personality Inventory might be revised in order to be applicable to 
men like retail grocers. \If the Bernreuter Inventory is to be used with 
success with adult people of a limited education, a revision is impera- 
tive. And if such a revision is made, one of the most important 
criteria of construction will be that ‘‘the propositions must be stated 
in simple, clear, and direct language so that their meaning can be 
grasped immediately.” 


1 Thorndike explains his ratings as follows: ‘“‘la means (that the word appears) 
in the first 500; 1b means in the second 500; 2a means in the third 500; 2b means 
in the fourth 500, and so on with 3a, 3b, 4a, 5a, and 5b. 6 means in the sixth 
thousand, 7 means in the seventh thousand, etc.” 


= 
1 
- 
4 
| 
| 
a 


BOOK REVIEWS 


E. G. Wiuutamson and M. E. Haun. Introduction to High-school 
Counseling. New York: McGraw-Hill, 1940, pp. 314. 


Increasing doubt concerning the adequacy of the product which 
comes off the secondary-school assembly-line is currently reflected in 
an accelerated interest in high-school personnel work. This is an 
exceedingly important development. The curriculum and its method 
of presentation are already undergoing severe scrutiny, and gratifying 
progress is being made toward adjusting them more satisfactorily 
to the individual. It is high time, however, that we more carefully 
scrutinized the individual himself, for it is equally important that 
the student be adjusted to a curriculum. He has, to be sure, been 
rather extensively examined physically and intellectually, but we 
do not know him nearly well enough emotionally and socially. The 
search for that knowledge and the use of it in his educational, social, 
and vocational guidance is the essence of personnal work. At the 
college level of education this task has already been taken seriously, 
and more or less extensive personnel organizations are operating, with, 
as we might expect, varying degrees of success. If, however, the 
high-school age is the formative period we have been led to believe 
it is, we cannot justify beginning a guidance program at the college 
level. And what of the great mass of youth who do not reach the 
college or the university? It may quite reasonably be argued that 
they represent potentially the most fruitful field for the application 
of effective guidance measures. 

Williamson and Hahn have done a real service to the unin- 
formed reader by writing a very comprehensive elementary survey 
of all the numerous ramifications of the personnel function in secondary 
schools. For the reader who wishes to be ‘“‘introduced”’ to the field, 
the book provides an adequate and useful orientation; and the high- 
school teacher or administrator whose acquaintance with the modern 
guidance movement is very limited will no doubt find quite informa- 
tive this enumeration of the various sorts of personnel activity, the 
effective methods of operating them, and their appropriate dis- 
tribution among the members of the school community. The book, 
however, is not intended for the advanced student; and since it is 
neither evaluative nor critical, scarcely warrants even for the ‘‘begin- 
ner” the intensive reading and study which the authors would seem 

474 


i 
if 
t 


Book Reviews 475 


to suggest by their supplementing of each chapter with a long list 
of ‘‘Review and Discussion Questions.”’ 

The authors’ estimation, implied rather than expressed, of the 
relative importance of the different guidance functions—educational, 
vocational, emotional, social—is entirely reasonable. They are at 
no time propagandists for any special aspect of the program or any 
particular form of personnel organization or administration. They 
appreciate the desirability of expert training in the more specialized 
kinds of counseling and warn against the adding of personnel duties 
to already overburdened teachers and administrators. Yet, they 
recognize the necessity of the gradual building of guidance programs 
through the use of available personnel and facilities. They are quite 
cognizant of the importance of “selling’’ the program to skeptical mem- 
bers of the community through tactful persuasion rather than pressure. 

One wishes the authors did not re-trace so many times the same 
problems and the same methods of dealing with them. Each of the 
first three chapters—‘‘The Development of Student Personnel 
Work,” ‘Students’ Problems and Personnel Work,” and ‘The 
Scope of Personnel Work’’—covers some of the same ground from 
but slightly different angles, and then much of it is gone over once 
again in Chapter VIII, ‘Counseling Students.” The section on 
Counseling Techniques in Chapter VIII merely rephrases what 
has already been said in the immediately preceding chapter on “Col- 
lecting Information for Counseling.” It is regrettable that more 
of the authors’ wide counseling experience has not been incorporated 
in their book, for their presentation and discussion of case histories 
in the chapter on “‘Counselors at Work”’ is the most illuminating 
portion of the volume for the student of personnel problems and 
techniques. Their chapters on the administration and development 
of a personnel program are comprehensive and practical. 

CaRLETON F. ScoFIEeD. 


University of Buffalo. 


ARNOLD GESELL. Wolf Child and Human Child—A Narrative Inter- 
pretation of the Life History of Kamala, the Wolf Girl. New 
York: Harper and Bros., 1941, pp. 107. 


The age-old controversy of Nature versus Nurture, when raised 
in any meeting of psychologists or educators, always finds a sufficient 
number of proponents on each side to insure a heated discussion. 


‘ 


476 The Journal of Educational Psychology 


Scientific findings disclosed through recent studies have only served 
to fan the flame still brighter, because students of this problem have 
tended to examine the experimental procedures and statistical accuracy 
of the opposition with much more critical eyes than they examine 
their own. In this volume Gesell presents material which should 
cause members of these two groups to view the question of innate and 
acquired abilities more dispassionately. 

This book is based on a diary account, kept by Reverend J. A. L.. 
Singh, of the daily activities of Kamala, the girl who was carried to a 
wolf den when only a few months of age and later captured from 
these four-footed foster parents at approximately eight years of age. 
The recordings in the diary begin in October, 1920, when Dr. Singh 
with the help of native guides captured the ‘‘man-ghosts’”’ Kamala 
and Amala (the latter a much younger girl who died soon after her 
capture) a few miles south of his mission in Midnapore, India. 

Actually nothing is known concerning the factors surrounding 
Kamala’s capture by the wolves and her life in the den. However, 
Gesell has done a stimulating task of reconstructing the first few 
years from the evidence at hand. He vividly depicts the manner 
in which she was taken by the she-wolf and the adaptation which 
this human infant made to the wolf culture as evidenced by her 
travelling on all fours, pinioning food with her hands, lapping water 
like an animal and running with the pack at night. Truly this child 
had been confronted “‘with a monstrously exceptional situation and 
solved it within her capacities as a human being.” 

Kamala was about eight years of age (and Amala about one and 
one-half) when recaptured and taken to the mission. Here Mrs. 
Singh exerted a profound influence toward developing a sense of 
security in this wolf-child by continued massage and physical treat- 
ment. However, the environmental effects of wolf ways were retained 
for more than two years, during which time Kamala continued to 
resort te the quadrupedal method of locomotion, growl like a dog, 
bare her teeth at other children and lap water from a pan. Slowly 
the strength of human heredity, aided by the favorable environment 
of the mission, exerted itself and in 1926 Kamala first walked on two 
feet and progressed to speaking in short sentences. From this time 
to her death in 1929 rapid progress was made toward human nor- 
mality as evidenced by a growing vocabulary, increasing self-reliance, 
desire to be in social groups and ability to care for the younger children. 

The «extremely interesting manner in which the history of Kamala 


ith 

t 

i 


Book Reviews 477 


has been presented serves as an excellent prelude to the questions 
which Gesell raises and discusses concerning the interdependence of 
hereditary and environmental factors. These questions are: (1) 
Did Kamala’s Wild Life Modify her Physique? (2) Was Kamala’s 
Brain Affected by Her Abnormal Experiences? (3) Was Kamala 
Mentally Deficient? (4) Was Kamala Psychopathic? (5) Can 
Wolf Ways be Humanized? (6) What if Kamala Had Lived Longer? 
The possible explanations which the author offers to these questions 
serve to illustrate that there is no sharp dichotomy existing between 
Nature and Nurture, for ‘‘we are not dealing with two sets of com- 
peting and incompatible forces but with a physiological process which 
brings them into mutual interaction.” 

The author in reconstructing the first few years of Kamala’s life 
from his imagination and the latter years from recordings in a diary 
presents a scholarly but intriguing study which will be welcomed by 
psychologists and students of human growth as a valuable addition 
to the existing literature in this field. Leo F. Smira. 

Rochester Athenaeum and Mechanics Institute. 


Rospert L. THORNDIKE. Children’s Reading Interests: A Study Based 
on a Fictitious Annotated Titles Questionnaire. New York: 
Bureau of Publications, Teachers College, Columbia University, 
1941, pp. 48. 


This bulletin is based on data yielded by a fictitious annotated 
titles questionnaire (after Tyler and Waples) which was administered 
to approximately three thousand boys and girls in grades four through 
twelve. The questionnaire included eighty-eight items (six of them 
“ringers’’?) and is reproduced in full as an appendix. The study 
indicated that ‘there is a consistent pattern of boy-interests (and 
aversions) and, to a somewhat lesser extent, a pattern of girl-interests 
cutting across all age and intelligence differences. . . . The ten-year- 
old boy and the fifteen-year-old boy are much more alike than different 
in their interests’’ (p. 35). Similarly, Thorndike found that the effect 
of brightness upon topics of reading interest was slight. The bright 
children checked as worth reading a greater number of titles, but these 
did not seem to be predominantly scholarly or bookish in nature. Sex 
was the most significant determining factor so far as the character of 
the reading interest was concerned. STEPHEN M. Corey. 

University of Chicago. 


1 
. | 
f 
1 
) 


478 The Journal of Educational Psychology 


NAGENDRA Pragmatism and Pioneering in 
Benoy Sarkar’s Sociology and Economics. Calcutta: Chucker- 
vertty Chatterjee and Co. Ltd., 1940, pp. 152. 


Benoy Sarkar has been an actively practical scholar in India since 
the beginning of the Twentieth Century. His books and articles show 
a catholic interest in the whole area of social science. Unlike Gandhi 
it has been the fundamental purpose of Sarkar to show to Indians the 
position of India in the modern world, its strengths and weaknesses, 
and to suggest means whereby it may progress. The interest of 
educators in this scholar is in his writings on educational theory and 
practice for India. 

The author of the present volume is primarily concerned with an 
interpretation of Sarkar’s social and economic theories, but refers to 
his work in education. Chaudhury is a loyal disciple; while reading 
one sometimes feels that Sarkar is too perfect, but none the less the 
author’s enthusiasm creates a real interest in the subject of the critique. 

C. M. Louttir. 


Indiana University. 


ARTHUR E. TRAXLER. Ten Years of Researchin Reading. New York: 
Educational Records Bulletin No. 32, Educational Records 
Bureau, 1941, pp. 195. (lithoprinted). 


This monograph includes a summary (forty-three pages) and 
annotations (one hundred thirty-seven pages) of some six hundred 
twenty reading researches. Both the summary and the titles in the 
annotated bibliography are organized under nineteen headings such 
as: Reading readiness and beginning reading; reading in the content 
subjects; phonics; eye movements and reading ability; activity pro- 
grams and reading achievement; reading bibliographies and summaries. 
There is a complete alphabetical index of authors and a very brief 
subject index. Because the summary consists of very brief statements 
about each of the titles in the bibliography, it duplicates the latter to a 
large extent. The annotations average about thirty-eight words. 
Although neither the summary statements nor the annotations are 
critical, the bibliography is complete and will be used frequently by 
persons interested in reading research. STEPHEN M. Corey. 

University of Chicago. 


‘ 


Book Reviews 479 


Sapie Goeeans. Units of Work and Centers of Interest in the Organiza- 
tion of the Elementary-school Curriculum. New York: Bureau of 
Publications, Teachers College, Columbia University, 1941, pp. 
140. 


This book will be of primary interest to educators and others 
concerned with the problems of curriculum construction in the elemen- 
tary school. Itis acritical and scholarly study of “the two antithetical 
schools of thought in education which influence the organization of 
the elementary-school curriculum.”’ These two antithetical schools 
determine two types of curriculum: “‘ Units of work” and “centers of 
interest.”” The first type stresses organized subject-matter, is based 
on the priority of thought, and is interested in the conservation of 
society. The second type focuses units of learning experience on 
functional aspects of child living, is based on the priority of experience, 
and is interested in the improvement of society. 

Chapter V, Priority of Thought or Experience, presents an excellent 
discussion of the dynamics of the learning process and elaborates a 
point of view that is, unfortunately, rarely found in most treatments 
of this most fundamental of psychological functions. This chapter 
should be required reading for any course, whatever its name might be, 
in the psychology of learning. 

There is a bibliography of three hundred thirty-nine titles. 

SranLtey G. DuLsky. 


Rochester Guidance Center. 
\ 
Mrs. Sr. Loe Srracuey. Borrowed Children. New York: The 
Commonwealth Fund, 1940, pp. 149. 


This book deals with the adjustment problems arising among 
children during the early months of the 1939 evacuation from the 
larger cities in England. ‘‘Evacuation is another word for dis- 
location.”” The emotional maladjustments caused or accentuated 
by the dislocation in the children’s lives are described. By citation 
of case histories and summarization of the implications, the problems 
are analyzed and the programs of mental hygiene employed are 
described. In addition there is an attempt to enumerate educational 
problems which have arisen. Although there is some evaluation 
of the implications of these educational problems, the solutions were 
not in sight. 


| 
B 
i 
Be 
f 
1 
| 
f 
3 
3 


480 The Journal of Educational Psychology 


The early journalistic reports of the evacuated children (incorrigi- 
bility, delinquency, filth, etc.) were alarming. Sympathetic analysis 
of the situation by child guidance experts and social workers revealed 
that much of this behavior was due to emotional maladjustment 
arising from the dislocation related to moving from home environ- 
ments to the country. The improved adjustment brought about 
through guidance under the direction of mental hygiene experts 
usually was accompanied by cessation of bed-wetting and many 
other forms of undesirable behavior. 

Diagnosis of maladjustment and specification of remedial programs 
were based largely upon the views of Dr. Moodie. The common 
origin of the difficulties was considered to lie in fear and anxiety. 
The children need security, love and understanding. Frequently 
the analyses involved a psychoanalytical approach. 

The general reader will find this book interesting and informative. 
Its main value, however, lies in its implications to educators, social 
agencies, child clinics and other agencies which will have to face 
similar emergencies involving dislocations in children’s lives. 

Mixes A. TINKER. 


University of Minnesota. 


* 

tr 


